GitXplorerGitXplorer
M

Efficient-Neural-Machine-Translation

public
19 stars
4 forks
0 issues

Commits

List of commits on branch master.
Unverified
c301abbe9d70973950acde23c1bbd82d959d43e9

delete some files

committed 6 years ago
Unverified
10d1cee8e8de194f4b42e67462268f2777e7059e

some updates to final version

committed 6 years ago
Unverified
168d38e538ac127ee94910eb7bfc39ae5eb219c2

Merge branch 'master' of github.com:MultiPath/Efficient-Neural-Machine-Translation

committed 7 years ago
Unverified
e2b4254e068576641aaf15b6108cdab4f7b3bf0b

add ack

committed 7 years ago
Verified
c3b6303514424b156fe623f071672b3219d8281b

Update README.md

MMultiPath committed 7 years ago
Unverified
fae0bafdaa7f2737b4e3d77fa027f59352b907f4

fix typos in ack

committed 7 years ago

README

The README file for this repository.

Jiatao Gu's thesis

Title: Efficient Neural Machine Translation

Abstract:

The dream of automatic translation that builds the communication bridge between people from different civilizations dates back to thousands of years ago. For the past decades, researchers devoted to proposing practical plans, from rule-based machine translation to statistical machine translation. In recent years, with the general success of artificial intelligence (AI) and the emergence of neural network models, a.k.a. deep learning, neural machine translation (NMT), as the new generation of machine translation framework based on sequence-to-sequence learning has achieved the state-of-the-art and even human-level translation performance on a variety of languages.

The impressive achievements brought by NMT are mainly due to its deep neural network structures with massive numbers of parameters, which can be efficiently tuned from vast volume of parallel data in the order of tens or hundreds of millions of sentences. Unfortunately, in spite of their success, neural systems also bring about new challenges to machine translation, in which one of the central problems is efficiency. The efficiency issue involves two aspects: (1) NMT is data-hungry because of its vast size of parameters, which makes training a reasonable model difficult in practice for low resource cases. For instance, most of the human languages do not have enough parallel data with other languages to learn an NMT model. Moreover, documents in specialized domains such as law or medicine usually contain tons of professional translations, leading to less efficiency for NMT to learn from; (2) NMT is slow in computation compared to conventional methods due to its deep structure and limitations of the decoding algorithms. Especially the low efficiency at inference time profoundly affects the real-life application and the smoothness of the communication. In some cases like video conference, we also hope the neural system translates at real-time which, however, is difficult for the existing NMT models.

This dissertation attempts to tackle these two challenges. Contributions are twofold: (1) We address the data-efficiency challenges presented by existing NMT models and introduce insights based on the characteristics of the data, which includes (a) developing the copy-mechanism to target on rote memories in translation and general sequence-to-sequence learning; (b) using a non-parametric search-engine to guide the NMT system to perform well in special domains; (c) inventing a universal NMT system for extremely low resource languages; (d) extending the universal NMT system to be able to efficiently adapt to new languages by combing with meta-learning.

(2) For the decoding-efficiency challenges, we develop novel structures and learning algorithms, including (a) recasting the decoding of NMT in a trainable manner to achieve state-of-the-art performance with less time; (b) inventing the non-autoregressive NMT system which enables translation in parallel; (c) developing the NMT model that learns to translate in real-time using reinforcement learning.