MuLD: The Multitask Long Document Benchmark

MuLD (Multitask Long Document Benchmark) is a set of 6 NLP tasks where the inputs consist of at least 10,000 words. The benchmark covers a wide variety of task types including translation, summarization, question answering, and classification. Additionally there is a range of output lengths from a single word classification label all the way up to an output longer than the input text.

This repo contains official code for the paper MuLD: The Multitask Long Document Benchmark.

Quickstart

The easiest method is to use the Huggingface Datasets library:

import datasets
ds = datasets.load_dataset("ghomasHudson/muld", "NarrativeQA")
ds = datasets.load_dataset("ghomasHudson/muld", "HotpotQA")
ds = datasets.load_dataset("ghomasHudson/muld", "Character Archetype Classification")
ds = datasets.load_dataset("ghomasHudson/muld", "OpenSubtitles")
ds = datasets.load_dataset("ghomasHudson/muld", "AO3 Style Change Detection")
ds = datasets.load_dataset("ghomasHudson/muld", "VLSP")

Or by cloning this repo:

import datasets
ds = datasets.load_dataset("./muld.py", "NarrativeQA")
...

Manual Download

If you prefer to download the data files yourself:

NarrativeQA Train Val Test
- Mirror: Train Val Test
HotpotQA Train, Val
- Mirror: Train Val
Character Archetype Classification Train Val Test
- Mirror: Train Val Test
OpenSubtitles Train Test
- Mirror: Train Test
Style Change Train Val Test,
- Mirror: Train Val Test
VLSP Test
- Mirror: Test

Citation

If you use our benchmark please cite the paper:

@InProceedings{hudson-almoubayed:2022:LREC,
  author    = {Hudson, George  and  Al Moubayed, Noura},
  title     = {MuLD: The Multitask Long Document Benchmark},
  booktitle = {Proceedings of the Language Resources and Evaluation Conference},
  month     = {June},
  year      = {2022},
  address   = {Marseille, France},
  publisher = {European Language Resources Association},
  pages     = {3675--3685},
  url       = {https://aclanthology.org/2022.lrec-1.392}
}

Additionally please cite the datasets we used (particularly NarrativeQA, HotpotQA, and Opensubtitles where we directly use their data with limited filtering).

Dataset Metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value

name MuLD

alternateName Multitask Long Document Benchmark

url https://github.com/ghomasHudson/muld

description MuLD (Multitask Long Document Benchmark) is a set of 6 NLP tasks where the inputs consist of at least 10,000 words. The benchmark covers a wide variety of task types including translation, summarization, question answering, and classification. Additionally there is a range of output lengths from a single word classification label all the way up to an output longer than the input text.

citation https://arxiv.org/abs/2202.07362

creator

property	value
name	`Thomas Hudson`
sameAs	`https://orcid.org/0000-0003-3562-3593`

muld

Commits

Update README.md

Remove confirm

Update muld.py