GitXplorerGitXplorer
p

discosense

public
4 stars
3 forks
0 issues

Commits

List of commits on branch main.
Verified
ab811e46ae85cbeab88d0f5cbaa427b0f0b1569f

Delete .DS_Store

pprajjwal1 committed 2 years ago
Verified
8d8d17baa15612062cf6e8af25a1871140ba70be

added missing examples

pprajjwal1 committed 2 years ago
Verified
c40d08afa1c76ad5e87d1a023de238e098ed4542

more instructions

pprajjwal1 committed 2 years ago
Verified
37af51f06f2e9a9c1601bb2dbd57645930668765

more instructions

pprajjwal1 committed 2 years ago
Verified
11b5737f1a245e65869c9ac6b007de12bec203e4

some updates.

pprajjwal1 committed 2 years ago
Verified
01863b3b7ea322f4c77025459158246350d08651

added dataset

pprajjwal1 committed 2 years ago

README

The README file for this repository.

Discosense: Commonsense Reasoning with Discourse Connectives

EMNLP 2022 Prajjwal Bhargava. Vincent Ng


Paper: arXiv

Data can be found in /data directory. The directory contains the training and test set.

Usage (requires Huggingface Datasets)

Install datasets:

$ pip3 install datasets

You can now use discosense in two lines of code

from datasets import load_dataset
train_dataset = load_dataset("prajjwal1/discosense", split="train")
test_dataset = load_dataset("prajjwal1/discosense", split="test")

Data is also stored in /data directory.

Models

Generative Models can be found here:

These models were trained as follows: Input: [control code] Sentence 1 Output: Sentence 2 (ground truth)

Model name Model Links
ctrl_discovery_1 Model Link
ctrl_discovery_2 Model Link
ctrl_discovery_3 Model Link
ctrl_discovery_4 Model Link
ctrl_discovery_5 Model Link
ctrl_discovery_6 Model Link
ctrl_discovery_7 Model Link
ctrl_discovery_8 Model Link
ctrl_discovery_9 Model Link
ctrl_discovery_10 Model Link
ctrl_discovery_11 Model Link
ctrl_discovery_12 Model Link
ctrl_discovery_13 Model Link
ctrl_discovery_14 Model Link

We also provide these generative models also.

These models were trained as follows: Input: [control code] Sentence 2 Output: Sentence 1 (ground truth)

Model Name Model Links
ctrl_discovery_flipped_1 Model Link
ctrl_discovery_flipped_2 Model Link
ctrl_discovery_flipped_3 Model Link
ctrl_discovery_flipped_4 Model Link
ctrl_discovery_flipped_5 Model Link
ctrl_discovery_flipped_6 Model Link

[control code] can be replaced by these discourse markers:

| although | in other words | particularly| rather|
| as a result | in particular | similarly |
| by contrast | in short | in sum | specifically |
| because of this | interestingly | subsequently | because of that |
| but | instead | thereafter | thereby | likewise |
| consequently | conversely | nevertheless | therefore |
| for example | nonetheless | though | for instance |
| on the contrary | thus | hence | on the other hand | yet |
| however | otherwise | in contrast | overall |

Conditional Adversarial Filtering

run_af.py can fine-tune, run CAF, run inference. These functionalities are acheived by passing different flags.

To run Conditional or non Conditional Adversarial Filtering,

export OUTPUT_DIR='../../experiments/albert_large_meh' # This is the directory where the model will be saved
export RAW_DATA='' 
# This is the input data (option need to be generated for this JSON). This just has the context, discourse marker and ending

export TRAIN_DATA='' # This file will be created once CTRL has generated the training data.
# Do not use `inference_only` flag, remove `replace_one` if you want all 3 options to be generated
# Add `replace_one` if you want one option to be generated. This is useful when you're doing CAF, because
# when CAF is being run, we only want to replace the most redundant option.

export BS=16                                    
export CONTEXT_COL='sentence1'            # Key to use for getting context
export TO_PREDICT_COL='sentence2'         # Sentence which requires to be generated by the model
export MARKER_COL='marker' 
export EPOCHS=4
export WARMUP_STEPS=4000

export CLASSIFICATION_MODEL='' # Discriminator LM, can be 'roberta-large`
export AUTOREGRESSIVE_MODEL='prajjwal1/ctrl_discovery_5'                        # Generator LM
export VALIDATION_DATA=''                  
# File path for validation data, if you have validation data, this will be used by CAF to filter out examples.

export FILE_OUTPUT_PATH=''              # File that will be saved after AF has completed, this will be created

# `replace_one` will replace only one option during AF (to generate all 3 options, remove this flag)
# `run_inference_only` will perform inference (for training, remove this flag)


python3 run_af.py --replace_one --run_inference_only --classification_model_name_or_path $CLASSIFICATION_MODEL 
                  --autoregressive_model_name_or_path $AUTOREGRESSIVE_MODEL --raw_data_path $RAW_DATA 
                  --train_data_path $TRAIN_DATA --validation_data_path $VALIDATION_DATA --output_dir $OUTPUT_DIR
                   --per_device_train_batch_size $BS  --per_device_eval_batch_size $((BS*8))
                   --num_train_epochs $EPOCHS --file_output_path $FILE_OUTPUT_PATH --context_col $CONTEXT_COL 
                   --to_predict_col $TO_PREDICT_COL --marker_col $MARKER_COL --fp16 --save_total_limit 1 --save_strategy epoch
                   --evaluation_strategy epoch --warmup_steps $WARMUP_STEPS

Training CTRL on your own data

This is the script used for training CTRL, you can modify it as per your own usage.

python3 run_clm_discovery.py --model_name_or_path ctrl  --do_eval \
        --per_device_train_batch_size 24 --per_device_eval_batch_size 42 \
        --output_dir ~/apex/experiment/ctrl_discovery_flipped_6 \
        --preprocessing_num_workers 4 --evaluation_strategy no \
        --tokenizer_name ctrl --fp16 --dataset_name discovery \
        --dataset_config_name discovery --context_col sentence2 \
        --to_predict_next_col sentence1 --save_total_limit 1 --save_steps 20000

Training and evaluation on HellaSWAG

$ cd hellaswag

Then run,

export BS=56
export EPOCH=4
export MAX_SEQ_LENGTH=96
export WARMUP_STEPS=1200
export LR=2e-5


export MODEL_PATH='google/electra-large-discriminator'
export OUTPUT_PATH='' # path where your discriminator will be stored

python3 run_hellaswag.py \
                      --model_name_or_path $MODEL_PATH\
                      --do_train \
                      --do_eval \
                      --num_train_epochs $EPOCH \
                      --output_dir $OUTPUT_PATH  \
                      --per_device_train_batch_size $BS \
                      --per_device_eval_batch_size $((BS*4)) \
                      --max_seq_length $MAX_SEQ_LENGTH \
                      --save_strategy epoch \
                      --evaluation_strategy epoch \
                      --warmup_steps $WARMUP_STEPS \
                      --fp16 \
                      --overwrite_output_dir