GitXplorerGitXplorer
m

founder-distribution-data

public
2 stars
3 forks
0 issues

Commits

List of commits on branch master.
Unverified
3f1db46a890086b74a47635dea331cf0fbf7bf84

copy/paste fail

mminimaxir committed 11 years ago
Unverified
4e137fe47f216321c7f26e0f66acb4c3c6d8796f

initial

mminimaxir committed 11 years ago
Unverified
030aa77256646a8af985b8a876120d5bd79baa1b

Initial commit

mminimaxir committed 11 years ago

README

The README file for this repository.

founder-distribution-data

Code and methodology for reproducing Gender Founder data. Saves a CSV of all founders possible which received funding from the specified venture capital firm. (Also saves a bonus CSV of the male and female counts by year.)

Code works as follows:

  1. Uses the data of all investments in startups from the November Export of CrunchBase data.

  2. Filter on investments made by user-specified firm (e.g. Y Combinator)

  3. For each startup, query CrunchBase for employees from that startup. For each employee, if they hold the title of "Founder," add that employee to the list.

  4. For each Founder, guess the Gender of the founder by comparing the First Name of the founder against Carnegie Mellon university's list of common male and female names used for NLP. If no guess, manually find the gender of the founder later.

  5. Remove duplicate founder entries if more than one investment for that founder in a year (extremely rare).

Final processed data available at this Google Spreadsheet (manual changes are bolded.)