Python project with the proposal's implementation from the paper:
A Hybrid Approach to Mining Conditions - HAIS, 2018 - Fernando O. Gallego and Rafael Corchuelo
- datasets/ Dataset's folder
- dataset-en
- dataset-en-lite
- dataset-es
- dataset-es-lite
- models/ Word2vec models' folder
- w2v-modelv2-en
- w2v-modelv2-en
- LICENCE
- candidates_creator.py
- main.py
- model_factory.py
- README
- validation.py
- word_preprocessing.py
- word_vectorizer.py
- Python 3.5.4 or above
- Theano 0.9.0
- Keras 2.0.8
- NLTK 3.2.4 with punkt and SnowballData models installed.
- Numpy 1.13.1
- Scikit-learn 0.19.0
- Gensim 2.3.0
main.py is the entry point of our experiments. It contains the following script parameters:
- relative path of the dataset's file
- language selected
- relative path of the word2vec model's file
- number of folds to perform k-fold cross validation
- deep learning model to use in the experiment (name of the class inside model_factory.py)
- relative path of the output csv file with the performance results
- score threshold to consider whether a candidate is a condition or not
python main.py dataset/dataset-en en models/w2v-modelv2-en 4 ModelA results/results-ModelA-en.csv 0.75