GitXplorerGitXplorer
a

moses-smt

public
13 stars
4 forks
0 issues

Commits

List of commits on branch master.
Unverified
b3305dd0a62b4d29027923a252c98a11b390d872

Fix Pig Latin example to work with reversed direction

aamake committed 5 years ago
Unverified
46b3e48e9d0a35095cffb62ea19d52204b63aeba

Nicely handle ^C or ^D during URL input in repl

aamake committed 5 years ago
Unverified
630a702603a8a15ee9ace96379efc3a87c7d1c7f

Remove unused import

aamake committed 5 years ago
Unverified
0278cbad83e26640968d7c694db5cbdc579a3240

Handle EOF in repl

aamake committed 5 years ago
Unverified
a05e9e66de16b0e3804b07fe5946f602fcc0c053

Add en-en_piglatin example

aamake committed 5 years ago
Unverified
c32b131955ff460b5db080709012a6e8b8753cc9

Add help target to Makefile

aamake committed 5 years ago

README

The README file for this repository.

Dock You a Moses

Want to play with the Moses Statistical Machine Translation system, but...

  • You don't have time to get a PhD in Setting Up Moses?

  • You have TMX files (or structured bilingual text files easily convertible to TMX) and want to use them with Moses without doing all the munging yourself?

Well now you don't have to, because I stuffed Moses in a Docker container for you.

What is this?

  • A full Moses + MGIZA installation in a Docker image: amake/moses-smt:base on Docker Hub

  • A make-based set of commands for easily

    • Converting TMX files into Moses-ready corpus files: make corpus

    • Training and tuning Moses: make train

    • Building Docker images of trained Moses instances: make build

    • Deploying trained Moses instances to Docker Hub/Amazon Elastic Beanstalk: make deploy-hub

  • Some peripheral tools:

    • A simple REPL for querying Moses over XML-RPC: mosesxmlrpcrepl.py or make repl

Requirements

  • make

  • Docker

  • Python 3 with pip and virtualenv

  • OS X? (not tested elsewhere)

  • Some TMX files (Okapi Rainbow is a good tool for converting structured bilingual files to TMX)

Usage

First, if trying to build the base image, you might need to re-balance the number of cores vs memory available to Docker: e.g. 8 cores but only 2 GB of memory results in compilation failures. 4 cores with 4 GB seems to work better.

  1. Put most of your TMXs in tmx-train, and the rest in tmx-tune.

  2. Run make SOURCE_LANG=<src> TARGET_LANG=<trg> [LABEL=<lbl>].

  • src and trg (required) are the language codes (not language + country) for your source and target languages, e.g. en and fr.

  • lbl is an optional label for the resulting image; myinstance by default.

  1. Wait forever.

  2. When done, you will have a Docker image tagged moses-smt:<lbl>-<src>-<trg>.

  • Run make server SOURCE_LANG=<src> TARGET_LANG=<trg> [PORT=<port>] to start mosesserver which you can query over XML-RPC.

  • Optionally specify a port; the default is 8080.

What then?

  • Train a new image with swapped languages or with a new set of TMXs.

  • Use a trained instance for translation in OmegaT with the omegat-moses-mt plugin:

    • Run make server to run the server locally; the moses.server.url value is then http://localhost:8080/RPC2

    • Run make deploy-hub and then upload the .zip that's produced as a new EB environment