GitXplorerGitXplorer
k

usif

public
77 stars
20 forks
1 issues

Commits

List of commits on branch master.
Unverified
71ffef5b6d7295c36354136bfc6728a10bd25d32

add citation

kkawine committed 6 years ago
Unverified
f01b58c32e04b22d322967d8258c5fc23bec3d8d

change from python2 to python3

kkawine committed 6 years ago
Unverified
1db85835f2b6bad50bc8186ae882277b646f0ae1

change from python2 to python3

kkawine committed 6 years ago
Verified
d5bbaab750da644815ce8d84404b67b5a6c710c9

Merge pull request #1 from soaxelbrooke/master

kkawine committed 6 years ago
Verified
980925a70866f0d388de3c47b7f1deef490f7966

Fix tab/space difference

ssoaxelbrooke committed 6 years ago
Verified
116d4d7c9010e07658261ec8edd53f3c33ce25df

Avoid unneeded work vectorize for m=0 case

ssoaxelbrooke committed 6 years ago

README

The README file for this repository.

uSIF

This is an implementation of unsupervised smoothed inverse frequency (uSIF), a simple but effective way to create sentence embeddings without any labelled data (Best Paper, Repl4NLP @ ACL 2018). See the paper for more details.

*01/11/18 Code now works for Python3 instead of Python2.

Setup

  1. Unzip the pre-trained ParaNMT word vectors (thanks to John Wieting for providing this).
  2. Install the python packages in requirements.txt.
  3. Initialize a uSIF embedding model with usif.py. Call get_paranmt_usif to get the model that uses the ParaNMT vectors and call test_STS to see if you get the expected results. Once you know it's working, feel free to try it with other word vectors.

Embedding Individual Sentences

If you don't have a sizable list of related sentences to embed, then there is not much point to doing piecewise common component removal, in which case you can set m = 0 when initializing uSIF. Even for STS tasks, setting m = 0 only decreases performance by 1 - 4%.

Reference

If you use this code, please cite

@article{ethayarajh2018unsupervised,
  title={Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline},
  author={Ethayarajh, Kawin},
  journal={ACL 2018},
  pages={91},
  year={2018}
}