GitXplorerGitXplorer
a

ml-delphi

public
10 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
d57e717cfb335386b6e6a0f9ac3060b6b909cf63

Update README.md

ZZidiXiu committed a year ago
Verified
c84205c7c8f8b85fc46cc7134affcf894ec0bf06

Update README.md

ZZidiXiu committed a year ago
Verified
9a1e69f66b372391d26ac3ba919ad4bf33daa74f

Update README.md

ZZidiXiu committed a year ago
Verified
ab1d92868da74b42ab3d391f52c2c81ed2c2978e

Update README.md with contact emails

ZZidiXiu committed a year ago
Unverified
9636020afc5c147437f071a02722ce930ab3787b

quora dataset link updated

ZZidiXiu committed a year ago
Unverified
3d0725957e9185ace24f145692a2c3f5ddcd2316

citation added

ZZidiXiu committed a year ago

README

The README file for this repository.

DELPHI: Data for Evaluating LLMs' Performance in Handling controversial Issues

This repository contains the DELPHI dataset and appendix. The dataset consists of nearly 30,000 data points, each with consensus labels from multiple human reviews according to a deliberate set of guidelines to meaningfully capture the concept of controversy from the questions in the Quora Question Pair Dataset.

This dataset was introduced in our paper which is accepted at EMNLP 23': DELPHI: Data for Evaluating LLMs' Performance in Handling controversial Issues.

Abstract

Controversy is a reflection of our zeitgeist, and an important aspect to any discourse. The rise of large language models (LLMs) as conversational systems has increased public reliance on these systems for answers to their various questions. Consequently, it is crucial to systematically examine how these models respond to questions that pertaining to ongoing debates. However, few such datasets exist in providing human-annotated labels reflecting the contemporary discussions. To foster research in this area, we propose a novel construction of a controversial questions dataset, expanding upon the publicly released Quora Question Pairs Dataset. This dataset presents challenges concerning knowledge recency, safety, fairness, and bias. We evaluate different LLMs using a subset of this dataset, illuminating how they handle controversial issues and the stances they adopt. This research ultimately contributes to our understanding of LLMs' interaction with controversial issues, paving the way for improvements in their comprehension and handling of complex societal debates.

Citing

If you use this dataset in your research, please cite our paper:

@inproceedings{sun2023Delphi,
title={DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues},
author={David Q. Sun, Artem Abzaliev, Hadas Kotek, Zidi Xiu, Christopher Klein, Jason D. Williams},
booktitle={EMNLP},
year={2023}
}

Contact: David Q. Sun dqs AT apple.com

Data Downloading

DELPHI Dataset

  • qid: controlversial question id list corresponding to the original kaggle training dataset <train.csv.zip>
  • r1: human annotated strong emotional reaction score (1 as least, 5 as highest)
  • r2: human annotated diverse and opposing opinions score (1 as least, 5 as highest)
  • Controversial question: Boolean label indicating human annotated result

Repository Structure

Describe the structure of the dataset. For example:

  • /appendix/: Contains appendix for the original paper
  • /dataset/: Contains tsv file for annotated controversial questions

Annotation UI

plot

Data License

DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues by Apple Inc. is licensed under CC BY-NC 4.0