神經翻譯?

Bengio的nlp大作《neural machine translation by jointly learning to align and translate》中提出nlp的注意力模型。


attention只是一個加權而已, 和rnn不矛盾. 與其說是模型, 不如說是機制(mechanism).


在這篇文章中「Encoder-Decoder Approaches」,Kyunghyun Cho發現隨著句子長度的增加,RNN Encoder--Decoder的翻譯質量會隨之下降。

可以想到兩種解決方法,加層數/ 新模型:

1. larger model architecture:

[1409.3215] Sequence to Sequence Learning with Neural Networks

"Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector."

用GPU並行加速訓練了一個LSTM模型(跟題主思路類似)

2. new model architecture:

[1409.0473] Neural Machine Translation by Jointly Learning to Align and Translate

"In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly."

引入bidirectional recurrent neural network (BiRNN) 解決語義向量長度固定的問題,引入Soft Attention Mechanism (softmax normalization) 對不定長向量進行加權標準化。

最後,Soft Attention Mechanism其實是個很有趣的東西,它能自己發現輸入層和輸出層之間陰藏的關係(weakly supervised learning)。

可以看下這篇文章,Soft Attention Mechanism 在 machine translation, image caption generation, video clip description and speech recognition方面的應用。

[1507.01053] Describing Multimedia Content using Attention-based Encoder--Decoder Networks


題主是想問「attention模型里能不能去掉encoder-decoder之間的語義向量」嗎?

我的理解是應該可以,這樣的話就轉化為一個純粹的end2end的memory network。具體的思路得有實驗做支撐。


推薦閱讀:

哪位高手能解釋一下卷積神經網路的卷積核?
如何作用和理解神經網路在電力電子方向的運用?
神經網路中的退火是什麼概念?
人工智慧深度學習的不斷發展對地質學這樣的經驗學科的影響會有多少??

TAG:機器學習 | 自然語言處理 | 神經網路 | 深度學習DeepLearning | 神經網路語言模型 |