GitXplorerGitXplorer
g

dolomites

public
4 stars
1 forks
0 issues

Commits

List of commits on branch main.
Unverified
8331dd998bf510cacc58d10ad613c9e685787747

Updating task validation labels.

ccalberti committed 8 months ago
Unverified
8032f1793aade0d1fe5ebdfe92340ccd1115246e

Updating README with latest changes.

ccalberti committed 8 months ago
Unverified
5d295b1d3f575ad49d8d97fa3e56ef95914e6c85

Prepare DoLoMiTes README.md for data release.

ccalberti committed 9 months ago
Unverified
d3d55df8a0bf5b008063c40c363fb88f3f1d537c

Initialize dolomites.

aa-googler committed 9 months ago
Unverified
01cb5f41233e5620c0b08e2256e5eff5129ea9a5

Add a LICENSE file for Apache 2.0.

committed 3 years ago

README

The README file for this repository.

DoLoMiTes: Domain-Specific Long-Form Methodical Tasks

This repository includes data for the DoLoMiTes (Domain-Specific Long-Form Methodical Tasks) evaluation benchmark, described in our paper.

Abstract

Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519 such tasks elicited from hundreds of experts from across 25 fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models highlighting that automating methodical tasks is a challenging long-form generation problem, as it requires performing complex inferences, while drawing upon the given context as well as domain knowledge.

Data

The benchmark data is available in JSONL format at:

  • Tasks: 519 task descriptions provided by experts.
  • Tasks Validation Labels: Labels for task validation provided by 3 independent experts.
  • Examples: Examples of the tasks post-edited by experts. We provide the development set (830 examples) with reference outputs and the test set (1037 examples) without reference outputs.

Citing this work

If you use any of the material here, please cite the following paper:

@article{malaviya2024dolomites,
  title={DOLOMITES: Domain-Specific Long-Form Methodical Tasks},
  author={Malaviya, Chaitanya and Agrawal, Priyanka and Ganchev, Kuzman and Srinivasan, Pranesh and Huot, Fantine and Berant, Jonathan and Yatskar, Mark and Das, Dipanjan and Lapata, Mirella and Alberti, Chris},
  journal={arXiv preprint arXiv:2405.05938},
  year={2024}
}

License and disclaimer

Copyright 2024 DeepMind Technologies Limited

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.