GitXplorerGitXplorer
s

backtranslated-imdb

public
11 stars
2 forks
0 issues

Commits

List of commits on branch master.
Unverified
1748960748619bc95f4a016309a18887249616be

more data

ssshleifer committed 5 years ago
Unverified
6848cd9c5ee7c9912647d898913bd9af6ff19371

imdb_train

ssshleifer committed 5 years ago
Unverified
940d9739bc5fee5207abb12f7ad7ff1c2adb6445

da train and pt train

ssshleifer committed 5 years ago
Unverified
3e7db2c2266849afb8c696ca8350dd7945384060

cs test

ssshleifer committed 5 years ago
Unverified
d428746dcc6d0b7ea9800c7a0734afafb9c24bee

AF train

ssshleifer committed 5 years ago
Unverified
d2fc20c9da5f0637a931cd25a1423f888b013847

Merge branch 'master' of github.com:sshleifer/text-augmentation

ssshleifer committed 5 years ago

README

The README file for this repository.

text-augmentation

Backtranslated imdb movie reviews. Each directory is named imdb_{language_code} and mimics the original structure of the imdb dataset.

Backtranslating movie reviews to more languages

For backtranslating training data through Italian, the command would be

python cache_backtranslations.py     --imdb_dir imdb/train/ --target_language it

Backtranslating other text

Modify cache_backtranslations.py to read from and write to new paths