GitXplorerGitXplorer
F

HAIS18

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
46df7907fef681ddf24ffeaf6ad8b770d6579664

Additional models added to perform a more exhaustive experimentation

FFernanOrtega committed 7 years ago
Unverified
bdad006ed001fc0546b1b83e9880cafeaab02fbc

Removed unnecessary calculus.

FFernanOrtega committed 7 years ago
Unverified
f11a3ab8eecbffe5033b8630e3a572d5404bd292

Fix to prevent problems with wrong deptrees.

FFernanOrtega committed 7 years ago
Unverified
5d5b927d84a761d3278928a34b53d0d0725d5ba4

Added a lite version of English dataset for testing purposes.

FFernanOrtega committed 7 years ago
Unverified
19ceac5bd5510b760f170f50f69fba8597d35fcd

Fixed the number of epochs

FFernanOrtega committed 7 years ago
Unverified
684684c2c618bde2dfd67c1ba43d80f79e8b72f5

Finished v1

FFernanOrtega committed 7 years ago

README

The README file for this repository.

HAIS18

Python project with the proposal's implementation from the paper:

A Hybrid Approach to Mining Conditions - HAIS, 2018 - Fernando O. Gallego and Rafael Corchuelo

Repository contents

  • datasets/ Dataset's folder
    • dataset-en
    • dataset-en-lite
    • dataset-es
    • dataset-es-lite
  • models/ Word2vec models' folder
    • w2v-modelv2-en
    • w2v-modelv2-en
  • LICENCE
  • candidates_creator.py
  • main.py
  • model_factory.py
  • README
  • validation.py
  • word_preprocessing.py
  • word_vectorizer.py

Requirements

  • Python 3.5.4 or above
  • Theano 0.9.0
  • Keras 2.0.8
  • NLTK 3.2.4 with punkt and SnowballData models installed.
  • Numpy 1.13.1
  • Scikit-learn 0.19.0
  • Gensim 2.3.0

Usage

main.py is the entry point of our experiments. It contains the following script parameters:

  1. relative path of the dataset's file
  2. language selected
  3. relative path of the word2vec model's file
  4. number of folds to perform k-fold cross validation
  5. deep learning model to use in the experiment (name of the class inside model_factory.py)
  6. relative path of the output csv file with the performance results
  7. score threshold to consider whether a candidate is a condition or not

Example of use:

python main.py dataset/dataset-en en models/w2v-modelv2-en 4 ModelA results/results-ModelA-en.csv 0.75