GitXplorerGitXplorer
f

dual-system-for-visual-language-reasoning

public
12 stars
2 forks
2 issues

Commits

List of commits on branch main.
Verified
a2c4e462d7e94c24d91f1ca2b379c290ae91bc67

Update build_atomic23_QA.py

mmaryamfz committed a year ago
Verified
032466f09ecd4db5d2078be89f48634c23783bb6

Update build_atomic1_QA.py

mmaryamfz committed a year ago
Unverified
f507d374b9d0c434eca0c76e926cee16c761e003

Updated readme

committed a year ago
Verified
cda6503fa18c0253a37d9b564f35e758fc565a84

Merge pull request #3 from facebookresearch/automated_fixup_code_of_conduct_file_exists

mmaryamfz committed a year ago
Verified
e02a89bc6f8a6a6d043f0cd79977a056c55ced59

Merge pull request #2 from facebookresearch/automated_fixup_contributing_file_exists

mmaryamfz committed a year ago
Unverified
a5cd412f258938b10f1badfd6a9b1ea880c8aa27

OSS Automated Fix: Addition of Code of Conduct

ffacebook-github-bot committed a year ago

README

The README file for this repository.

DOMINO: A Dual-System for Multi-step Visual Language Reasoning

This is a Pytorch implementation for DOMINO: A Dual-System for Multi-step Visual Language Reasoning.

TL;DR: We propose a dual-system for multi-step visual language reasoning called DOMINO which outperforms existing models on challenging chart question answering datasets.

show

DOMINO alternates between System-2 (a prompted LLM) and System-1 (a visual encoder-text decoder) to answer complex questions over charts. The text in blue callouts are generated by System-2. The text in green callouts are generated by System-1 and appended to the generation sequence of System-2 directly. The chart and the question are from ChartQA (Masry et al., 2022).

Code folders

(1) system1-vision: Fine-tuning and inference with the vision module.

(2) system2-lm: Prompting LM for solving downstream tasks.

Dependencies

  • Python >= 3.6
  • PyTorch == 1.12.1
  • transformers == 4.29.2
  • fairscale == 0.4.6
  • sentencepiece == 0.1.99

Data

We used the following datasets:

Fine-tuning a vision module for visual information extraction

cd system1-vision
sbatch ./scripts/finetune_deplot.sh <HOME_DIR>

After training, the checkpoint of the vision module is saved to $VISION_CHECKPOINT='HOME_DIR/outputs/checkpoint' for later use.

Prompting LM for downstream tasks

The scripts for different tasks are stored at system2-lm/scripts. To run the script,

cd system2-lm
./script/run_dualsys_chartQA.sh <HOME_DIR>

License

The code is CC-BY-NC 4.0 licensed, as found in the LICENSE file.

Citation

Please cite our paper if DOMINO is used in your work:

@misc{wang2023domino,
      title={DOMINO: A Dual-System for Multi-step Visual Language Reasoning}, 
      author={Peifeng Wang and Olga Golovneca and Armen Aghajanyan and Xiang Ren and Muhao Chen and Asli Celikyilmaz and Maryam Fazel-Zarandi},
      year={2023},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}