TOAD

This software project accompanies the research paper, TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles. This paper has been accepted by ACL 2024.

TOAD is a synthetic TOD dataset that simulates realistic app context interactions and provides multiple system response styles (verbosity & mirroring user expressions).

Run Data Synthesis

Preparation:

Install dependencies from requirements.txt.
We use OpenAI Compatible API to make requests to LLMs. Set the environment variable OPENAI_API_KEY, BASE_URL (optional) and ENGINE (e.g. "gpt-3.5-turbo") to config the backend LLM. You can use a dotenv file.

Synthesis: The data synthesis pipeline is divided into 3 steps. The generated files will be stored in data/.

Step 1: Context generation

Run python -m context_generation.occupation_generator to synthesize occupations.json (you can skip this step and re-use the existing file).
Run python -m context_generation.persona_generator to synthesize personas.jsonl using occupations.
Run python -m context_generation.context_generator to synthesize contexts.jsonl using personas.

Step 2: Dialog generation

Run code in dialog_generation to synthesize dialogs based on contexts. Example command:

python -m dialog_generation.main \
    --phenomena='compound' \
    --output_dir='data/dialogs' \
    --number_of_data=1000 \
    --full_options_mode \
    --thread_num=15

--phenomena specifies the phenomena to be used in dialog generation. It can be one of compound, compositional, none.
--output_dir specifies the path to save the generated dialogs.
--number_of_data specifies the number of dialogs to generate.
--full_options_mode asks for generating of all 6 response style options.
--thread_num specifies the number of threads to run in parallel.

For how to customize dialog generation by modifying the schema.json, please refer to the documentation in that directory.

Step 3: Quality control

Run python -m quality_control.main to filter out inconsistent dialogs using the LLM.

Citation

@inproceedings{liu2024toad,
    title = "{TOAD}: Task-Oriented Automatic Dialogs with Diverse Response Styles", 
    author = "Liu, Yinhong  and
      Fang, Yimai  and
      Vandyke, David  and
      Collier, Nigel",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    year = "2024",
    url = "https://arxiv.org/abs/2402.10137"
}

ml-toad

Commits

Update README.md

Update README.md

Update README.md

first commit

README

TOAD

Run Data Synthesis

Citation