GitXplorerGitXplorer
B

mateo-demo

public
17 stars
2 forks
6 issues

Commits

List of commits on branch main.
Unverified
480e76786dcfcfb3174de338fd794d1d606a074e

update dockerfiles to new version

BBramVanroy committed 10 months ago
Unverified
5f6c9ca743c8399e96adf277f6b17bcc03987c6e

update to comet 2.2.2 for windows

BBramVanroy committed 10 months ago
Unverified
2b8befd8e6a72e45dccea0d1220ca976e7a260e3

version-flag the dockerfiles

BBramVanroy committed 10 months ago
Unverified
8ba4966cf924d55b6f8711fc25f90399aaff9782

version bump

BBramVanroy committed 10 months ago
Unverified
ea9b1df00a59da1ae2423bdee97afcaee3c51c4b

Update README.md

BBramVanroy committed 10 months ago
Unverified
7cdf9414c795bde2982d3c089418949c4e2cffad

fix no_cuda test

BBramVanroy committed 10 months ago

README

The README file for this repository.

MAchine Translation Evaluation Online (MATEO)

HF Spaces shield License shield Code style black Built with Streamlit

We present MAchine Translation Evaluation Online (MATEO), a project that aims to facilitate machine translation (MT) evaluation by means of an easy-to-use interface that can evaluate given machine translations with a battery of automatic metrics. It caters to both experienced and novice users who are working with MT, such as MT system builders, and also researchers from Social Sciences and Humanities, and teachers and students of (machine) translation.

MATEO can be accessed on this website, hosted by the CLARIN B center at Instituut voor de Nederlandse Taal. It is also available on Hugging Face Spaces.

If you use the MATEO interface for your work, please cite our project paper!

Vanroy, B., Tezcan, A., & Macken, L. (2023). MATEO: MAchine Translation Evaluation Online. In M. Nurminen, J. Brenner, M. Koponen, S. Latomaa, M. Mikhailov, F. Schierl, … H. Moniz (Eds.), Proceedings of the 24th Annual Conference of the European Association for Machine Translation (pp. 499–500). Tampere, Finland: European Association for Machine Translation (EAMT).

@inproceedings{vanroy-etal-2023-mateo,
    title = "{MATEO}: {MA}chine {T}ranslation {E}valuation {O}nline",
    author = "Vanroy, Bram  and
      Tezcan, Arda  and
      Macken, Lieve",
    booktitle = "Proceedings of the 24th Annual Conference of the European Association for Machine Translation",
    month = jun,
    year = "2023",
    address = "Tampere, Finland",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2023.eamt-1.52",
    pages = "499--500",
}

Self-hosting

The MATEO website is provided for free as a hosted application. That means that you, or anyone else, can use it. The implication is that it is possible that the service will be slow depending on the usage of the system. As such, specific attention was paid to making it easy for you to set up your own instance that you can use!

Duplicating a Hugging Face Spaces

MATEO is also running on the free platform of 🤗 Hugging Face in a so-called 'Space'. If you have an account (free) on that platform, you can easily duplicate the running MATEO instance to your own profile. That means that you can create a private duplication of the MATEO interface just for you and free of charge! You can simply click this link or, if that does not work, follow these steps:

  1. Go to the Space;
  2. in the top right (below your profile picture) you should click on the three vertical dots;
  3. choose 'Duplicate space', et voilà!, a new space should now be running on your own profile

Install locally with Python

You can clone and install the library on your own device (laptop, computer, server). I recommend to run this in a new virtual environment. It requires python >= 3.10.

Run the following commands:

git clone https://github.com/BramVanroy/mateo-demo.git
cd mateo-demo
python -m pip install .
cd src/mateo_st
streamlit run 01_🎈_MATEO.py

The streamlit server will then start on your own computer. You can access the website via a local address, http://localhost:8501 by default.

Configuration options specific to Streamlit can be found here. They are more related to server-side configurations that you typically do not need when you are running this directly through Python. But you may need them when you are using Docker, e.g. setting the --server.port that streamlit is running on (see Docker).

A number of command-line arguments are available to change the interface to your needs.

--use_cuda             whether to use CUDA for translation task (CUDA for metrics not supported) (default: False)                                                                                                                                      
--demo_mode           when demo mode is enabled, only a limited range of neural check-points are available. So all metrics are available but not all of the checkpoints. (default: False)

These can be passed to the Streamlit launcher by adding a -- after the streamlit command and streamlit-specific options, followed by any of the options above.

For instance, if you want to run streamlit specifically on port 1234 and you want to use the demo mode, you can modify your command to look like this:

streamlit run 01_🎈_MATEO.py --server.port 1234 -- --demo_mode

Note the separating -- in the middle so that streamlit can distinguish between streamlit's own options and the MATEO configuration parameters.

Running with Docker

If you have docker installed, it is very easy to get a MATEO instance running.

The following Dockerfiles are available in the docker directory. They are a little bit different depending on the specific needs.

  • hf-spaces: specific configuration for Hugging Face spaces but without env options

  • default: a more intricate Dockerfile that accepts environment variables to be used that are specific to the server, demo functionality, and CUDA. These Docker environment variables are available.

    • PORT: server port to expose and to run the streamlit server on (default: 7860)
    • SERVER: server address to run on (default: 'localhost')
    • BASE: base path (default: '')
    • DEMO_MODE: set to true to disable some options for neural metrics and to limit the max. upload size to 1MB per file (default: '')

As an example, to build and run the repository on port 5034 with CUDA disabled and demo mode enabled, you can run the following commands which will automatically use the most recent cpu Dockerfile from Github.

docker build -t mateo https://raw.githubusercontent.com/BramVanroy/mateo-demo/main/docker/cpu/Dockerfile
docker run --rm -d --name mateo-demo -p 5034:5034 --env PORT=5034 --env DEMO_MODE=true mateo

Note how the opened ports in Docker's -p must correspond with the env variable PORT!

MATEO is now running on port 5034 and available on the local address http://localhost:5034/.

As mentioned before, you can modify the Dockerfiles as you wish. Most notably you may want to change the streamlit launcher command itself. Therefore you could use the streamlit options alongside custom options for MATEO specifically, which were mentioned in the previous section.

Tests

The tests are run using pytest and playwright. To ensure that the right dependencies are installed, you can run

python -m pip install -e .[dev]

Then, install the appropriate chromium version for playwright. You can do this by running the following command.

playwright install --with-deps chromium

Now you can run the tests by running the following command in the root directory of the project.

python -m pytest

Notes

Using CUDA

Using CUDA for the metrics is currently not supported. However, it is possible to use CUDA for the translation task. This can be done by setting the --use_cuda flag when running the Streamlit server. This will enable the use of CUDA for the translation task, but not for the metrics. The reason for this is the memory consumption since streamlit creates a separate instance for each user, the GPU may run OOM quickly and moving on/off devices is not feasible.

I have not found a solution for this yet. A queueing system would solve the issue with a separate backend and dedicated workers, but that defeats the purpose of having a simple, easy-to-use interface. It would also lead to the requirement of strong data for longer, which many users may not want to, considering that I've received many questions whether I save their data on disk (I don't - the current approach processes everything in memory).

Acknowledgements

This project was kickstarted by a Sponsorship project from the European Association for Machine Translation, and a substantial follow-up grant by the support of CLARIN.eu.

EAMT logo CLARIN logo