GitXplorerGitXplorer
t

badder-seeds

public
4 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
2f1be7515f418bac95629e7de57bed366422af5c

Create .gitattributes

tthesofakillers committed 3 years ago
Unverified
4f8dfbfd5f103ef4e5c5df98fe9daff7c893cf50

mess with floats; show author names

tthesofakillers committed 3 years ago
Unverified
1613d7be89f74de7f1138216df30fe7d8b69f327

Merge branch 'main' of github.com:thesofakillers/mlrc-2021

tthesofakillers committed 3 years ago
Unverified
0a6cdfe44918fffd6f8d7b220495a6d602e1dcfa

report typesetting

tthesofakillers committed 3 years ago
Verified
d126f793d5ce2538ccfa60c4ce5d5ab86f69e6f6

Update README with publication link

tthesofakillers committed 3 years ago
Unverified
42a4f475ffeae536dba35db0f3d0f8640dc3c463

final black commit

tthesofakillers committed 3 years ago

README

The README file for this repository.

Bad Seeds: Evaluating Lexical Methods for Bias Measurement

This repository contains the code for our published paper [Re] Badder Seeds: Reproducing the Evaluation of Lexical Methods for Bias Measurement, where we worked on reproducing the paper Bad Seeds: Evaluating Lexical Methods for Bias Measurement as part of the 2021 ML Reproducibility Challenge.

Usage

Setup

Requirements

For clearer specification of our setup, we make use of Poetry to keep track of dependencies and python versions. Details such as python and package versions can be found in the generated pyproject.toml and poetry.lock files.

For poetry users, getting setup is as easy as running

poetry install

We also provide an environment.yml file for Conda users who do not wish to use poetry. In this case simply run

conda env create -f environment.yml

Finally, if neither of the above options are desired, we also provide a requirements.txt file for pip users. In this case, simply run

pip install -r requirements.txt

NOTE: After installation is complete, please run the following command to download the necessary language files for spacy:

python -m spacy download en_core_web_sm

Data and Models

Users are strongly encouraged to read DATA.md before proceeding

Repository Structure

.
├── badseeds/ # scripts
│   └── __init__.py
├── notebooks/ # notebooks for reproduction, which import scripts
│   └── results.ipynb
├── report/ # files for typesetting our report
├── README.md # you are here
├── DATA.md # documentation on data
├── poetry.lock # handled by poetry
├── pyproject.toml # if you are using poetry
├── gen_pip_conda_files.sh # for generating the pip and conda files with poetry
├── seed_set_pairings.csv # the gathered seed set pairings we contribute
├── config.json # example config file specifying dir/file paths
├── environment.yml # if you are using conda
└── requirements.txt # if you are using pip

Users interested in only reproducing the results should visit the notebooks/ subdirectory of the repository, where we have a set of notebooks that can be run to reproduce the results of our paper.

For more curious users, we invite them to visit the badseeds/ subdirectory which contains the actual implementation details.

Development

Packages and Environment

If you wish to contribute to this repository, please make use of Poetry when installing new packages, as this makes dependency management much easier and transparent.

We strongly suggest using a virtual environment. Poetry will create an environment if one is not already active, using information from the pyproject.toml file.

Alternatively of course, one can also handle environments on their own, e.g. by making one with conda and activating it before usage. Poetry will automatically detect this environment and know to install packages in this particular environment. If you decide to go this route, just ensure the python version for the environment is 3.9. For conda, this would consist in creating an environment as such:

conda create --name badseeds python=3.9

Contributing

Approved contributors are able to (and should) create their own branch and work there, before merging to the main branch. External contributors can instead fork the repository and open a pull request when ready.