This Python project uses LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) based Recurrent Neural Networks to forecast (predict) timeseries using Keras + Theano. We compare the results produced by each of these deep neural networks with those from a linear regression model.
Dataset: Number of daily births in Quebec, Jan. 01 '77 - Dec. 31 '90 (Hipel & McLeod, 1994)
##Usage I suggest you install Virtualenv before trying this out.
git clone https://github.com/dhrushilbadani/deeplearning-timeseries.git
cd deeplearning-timeseries
virtualenv ENV
source ENV/bin/activate
pip install --upgrade pip
pip install keras h5py pandas sklearn
python evaluate.py
##Architecture & Model Properties We use Keras' Sequential model to construct recurrent neural networks. There are 3 layers:
- Layer 1 : Either a LSTM (with output dimension 10, and statefulness enabled) layer or a GRU (with output dimension 4) layer.
- Layer 2 : A Dropout layer with dropout probability = 0.2, to prevent overfitting.
- Layer 3 : A fully-connected Dense Layer with output dimension 1.
- Default optimizer: rmsprop; Default # of epochs: 150.
- Accuracy Metric: Mean Squared Error.
##Results & Observations
- The LSTM-RNN model performed the best with a MSE of 1464.78 (look back = 37).
- Naively making the RNN "deeper" did not yield immediate results; I didn't fine-tune the parameters (output_dim, for example) though.
- Making the LSTM network stateful (setting
stateful=true
when initializing the LSTM layer) did yield a significant performance improvement though. In stateless LSTM layers, the cell states are reset at each sequence. Whenstateful=true
however, the states are propagated onto the next batch i.e. the state of the sample located at indextrainX[i]
will be used in the computation of the sampletrainX[i+k]
in the next batch, wherek
is the batch size. You can read more about this at the Keras docs. - Using Glorot initializations yielded a performance improvement. However, using He uniform initialization (Gaussian initialization scaled by fan_in) yielded even better results than with Glorot.
##Files
##To-do
##References
- On the use of ‘Long-Short Term Memory’ neural networks for time series prediction, Gomez-Git et. al, 2014.
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava et. al 2014.
- Learning to forget, Gers, Schmidhuber & Cummins, 2000.
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, Chung et. al, 2014.