Normalized dataset of 70k job titles
The data is normalized in the following ways:
- lowercase
-
-
replaced with a <Space> -
,
removed
- Duplicates such as
a and p mechanic
anda&p mechanic
- Non-English titles such as
ab initio etl developer
Feel free to open a pull request fixing above listed caveats or any other enhancements.
Only edit job-titles.txt
. After doing so run ./format.sh
.
This dataset is a collection of the following sources: