GitXplorerGitXplorer
g

inli

public
1 stars
1 forks
1 issues

Commits

List of commits on branch main.
Unverified
5fe253e3cc1686dbb256d23ced4684cfe2750d63

renamed resources folder

committed 2 months ago
Unverified
05c2e93346d175dcf4b7d9e1324c906c058a2419

changed License

committed 2 months ago
Unverified
3acdcd35a11b2c3fd9ba8d4b412628f27c5daeff

added License and updated README

committed 2 months ago
Unverified
1e8d7095aa5e7d90c293f161fb9e005fcd4afbae

adding license + disclaimer

committed 2 months ago
Unverified
d40bfc1ca74a14e9f4f56cd4ef9f8079c102ef3b

uploading INLI data

committed 2 months ago

README

The README file for this repository.

Implied NLI

Dataset Overview

Implied NLI, or INLI is a corpus of 10k premises that mirror real-world communication and 40k hypotheses that are implied, explicit, neutral, and contradictory.

  • Number of examples: 40,000
  • Number of labels: 4 (implied entailment entailment, explicit, neutral, contradictory)

On top of the raw data, we also include all prompts used to generate INLI, all human annotations described in the paper, and the outputs of the LLM experiments described in the paper.

For more details on the design and content of the dataset, please see our paper

Data Format

Our raw dataset is split into three csv files.

  • Size of training dataset: 32,000
  • Size of test dataset: 4,000
  • Size of validation dataset: 4,000

In each of these files, one row represents a premise and four corresponding hypotheses. These files include the following columns:

  • dataset: The original dataset containing the conversation or social norm used to create the premise
  • premise: A premise
  • implied_entailment: A hypothesis implicitly entailed in the premise
  • explicit_entailment: A hypothesis explicitly entailed in the premise
  • neutral: A hypothesis neither entailed nor contradicted by the premise
  • contradiction: A hypothesis contradicted by the premise

Citation

Contact

Shreya Havaldar

Disclaimer

INLI may contain premises and hypotheses involving sensitive content, e.g. descriptions of violence, harassment, profanity, etc. All such content originates from the datasets INLI is built off of (Ludwig, Circa, NormBank, SocialChem) and was not introduced by Gemini during the generation of INLI. All such content is purely hypothetical and not based on real people or events. Anyone using this dataset should be aware of these limitations of the dataset.