GitXplorerGitXplorer
M

Efficient-Neural-Machine-Translation

public
19 stars
4 forks
0 issues

Commits

List of commits on branch master.
Unverified
2a9bc34a3082f1e3da4e27f37621789aa6331543

small changes

committed 7 years ago
Unverified
f726b08d4fd88a37e7c0ee01af3726cd4736e3c7

fix typos; add ack

committed 7 years ago
Unverified
f74226d9af621fd60bc5b86312bc4ab3bcebab12

still need more information

committed 7 years ago
Unverified
10e1b2f57e3a44b478f77a4d17b292cb5b91e30c

from offline back to online

committed 7 years ago
Unverified
6747cc2de16e612b46c325bf1011e95d58232be6

move to texpad

committed 7 years ago
Unverified
2f92d6b9e8abdb3f09d9e938733f7e7fb47c2ae8

initial push

MMultiPath committed 7 years ago

README

The README file for this repository.

Jiatao Gu's thesis

Title: Efficient Neural Machine Translation

Abstract:

The dream of automatic translation that builds the communication bridge between people from different civilizations dates back to thousands of years ago. For the past decades, researchers devoted to proposing practical plans, from rule-based machine translation to statistical machine translation. In recent years, with the general success of artificial intelligence (AI) and the emergence of neural network models, a.k.a. deep learning, neural machine translation (NMT), as the new generation of machine translation framework based on sequence-to-sequence learning has achieved the state-of-the-art and even human-level translation performance on a variety of languages.

The impressive achievements brought by NMT are mainly due to its deep neural network structures with massive numbers of parameters, which can be efficiently tuned from vast volume of parallel data in the order of tens or hundreds of millions of sentences. Unfortunately, in spite of their success, neural systems also bring about new challenges to machine translation, in which one of the central problems is efficiency. The efficiency issue involves two aspects: (1) NMT is data-hungry because of its vast size of parameters, which makes training a reasonable model difficult in practice for low resource cases. For instance, most of the human languages do not have enough parallel data with other languages to learn an NMT model. Moreover, documents in specialized domains such as law or medicine usually contain tons of professional translations, leading to less efficiency for NMT to learn from; (2) NMT is slow in computation compared to conventional methods due to its deep structure and limitations of the decoding algorithms. Especially the low efficiency at inference time profoundly affects the real-life application and the smoothness of the communication. In some cases like video conference, we also hope the neural system translates at real-time which, however, is difficult for the existing NMT models.

This dissertation attempts to tackle these two challenges. Contributions are twofold: (1) We address the data-efficiency challenges presented by existing NMT models and introduce insights based on the characteristics of the data, which includes (a) developing the copy-mechanism to target on rote memories in translation and general sequence-to-sequence learning; (b) using a non-parametric search-engine to guide the NMT system to perform well in special domains; (c) inventing a universal NMT system for extremely low resource languages; (d) extending the universal NMT system to be able to efficiently adapt to new languages by combing with meta-learning.

(2) For the decoding-efficiency challenges, we develop novel structures and learning algorithms, including (a) recasting the decoding of NMT in a trainable manner to achieve state-of-the-art performance with less time; (b) inventing the non-autoregressive NMT system which enables translation in parallel; (c) developing the NMT model that learns to translate in real-time using reinforcement learning.