
Bengio的nlp大作《neural machine translation by jointly learning to align and translate》中提出nlp的注意力模型。

attention只是一個加權而已, 和rnn不矛盾. 與其說是模型, 不如說是機制(mechanism).

在這篇文章中「Encoder-Decoder Approaches」,Kyunghyun Cho發現隨著句子長度的增加,RNN Encoder--Decoder的翻譯質量會隨之下降。

可以想到兩種解決方法,加層數/ 新模型:

1. larger model architecture:

"Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector."


2. new model architecture:

"In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly."

引入bidirectional recurrent neural network (BiRNN) 解決語義向量長度固定的問題,引入Soft Attention Mechanism (softmax normalization) 對不定長向量進行加權標準化。

最後,Soft Attention Mechanism其實是個很有趣的東西,它能自己發現輸入層和輸出層之間陰藏的關係(weakly supervised learning)。

可以看下這篇文章,Soft Attention Mechanism 在 machine translation, image caption generation, video clip description and speech recognition方面的應用。

我的理解是應該可以,這樣的話就轉化為一個純粹的end2end的memory network。具體的思路得有實驗做支撐。



