Code and methodology for reproducing Gender Founder data. Saves a CSV of all founders possible which received funding from the specified venture capital firm. (Also saves a bonus CSV of the male and female counts by year.)
Code works as follows:
-
Uses the data of all investments in startups from the November Export of CrunchBase data.
-
Filter on investments made by user-specified firm (e.g. Y Combinator)
-
For each startup, query CrunchBase for employees from that startup. For each employee, if they hold the title of "Founder," add that employee to the list.
-
For each Founder, guess the Gender of the founder by comparing the First Name of the founder against Carnegie Mellon university's list of common male and female names used for NLP. If no guess, manually find the gender of the founder later.
-
Remove duplicate founder entries if more than one investment for that founder in a year (extremely rare).
Final processed data available at this Google Spreadsheet (manual changes are bolded.)