GitXplorerGitXplorer
r

pytorch-lm

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
72f9674ba1c7b5f1c14b52669ab4d2be8181cf4f

Update requirements.txt

rrbournhonesque committed 5 years ago
Unverified
894ada60cd94c0d43db71411cc5410c9dddb1aff

Update requirements.txt

rrbournhonesque committed 5 years ago
Unverified
10e551c1d9f2af2d94809cf6d93d4e5e56d9cf89

Update requirements.txt

rrbournhonesque committed 5 years ago
Unverified
07c9c7c63f3b68f51c924c20645aceaed1bc546a

Add QRNN architecture

rrbournhonesque committed 5 years ago
Unverified
f40e64eeff70d429ea2dc17aee3451fccc249d01

Workaround to fix bug in Pytorch resulting in incorrect attention masking

rrbournhonesque committed 5 years ago
Unverified
d0f264e9b036d632a6113d8c8cb8311ab4a72b4f

first commit

rrbournhonesque committed 5 years ago

README

The README file for this repository.

Word-level language modeling RNN

This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the Wikitext-2 dataset, provided. The trained model can then be used by the generate script to generate new text.

python main.py --cuda --epochs 6           # Train a LSTM on Wikitext-2 with CUDA
python main.py --cuda --epochs 6 --tied    # Train a tied LSTM on Wikitext-2 with CUDA
python main.py --cuda --epochs 6 --model Transformer --lr 5   
                                           # Train a Transformer model on Wikitext-2 with CUDA
python main.py --cuda --tied               # Train a tied LSTM on Wikitext-2 with CUDA for 40 epochs
python generate.py                         # Generate samples from the trained LSTM model.
python generate.py --cuda --model Transformer
                                           # Generate samples from the trained Transformer model.

The model uses the nn.RNN module (and its sister modules nn.GRU and nn.LSTM) which will automatically use the cuDNN backend if run on CUDA with cuDNN installed.

During training, if a keyboard interrupt (Ctrl-C) is received, training is stopped and the current model is evaluated against the test dataset.

The main.py script accepts the following arguments:

optional arguments:
  -h, --help                       show this help message and exit
  --data DATA                      location of the data corpus
  --model MODEL                    type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)
  --emsize EMSIZE                  size of word embeddings
  --nhid NHID                      number of hidden units per layer
  --nlayers NLAYERS                number of layers
  --lr LR                          initial learning rate
  --clip CLIP                      gradient clipping
  --epochs EPOCHS                  upper epoch limit
  --batch_size N                   batch size
  --bptt BPTT                      sequence length
  --dropout DROPOUT                dropout applied to layers (0 = no dropout)
  --decay DECAY                    learning rate decay per epoch
  --tied                           tie the word embedding and softmax weights
  --seed SEED                      random seed
  --cuda                           use CUDA
  --log-interval N                 report interval
  --save SAVE                      path to save the final model
  --transformer_head N             the number of heads in the encoder/decoder of the transformer model
  --transformer_encoder_layers N   the number of layers in the encoder of the transformer model
  --transformer_decoder_layers N   the number of layers in the decoder of the transformer model
  --transformer_d_ff N             the number of nodes on the hidden layer in feed forward nn

With these arguments, a variety of models can be tested. As an example, the following arguments produce slower but better models:

python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40           
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied    
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40        
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied