GitXplorerGitXplorer
c

earnest

public
1 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
fd45ecea6ffd8bd7b62eafdd05a80cfac1fffdaf

Update README.md

ccaptainpete committed 11 days ago
Unverified
927c2ae7e24eca118951ecb536789f5fb5c8c0c2

Project snapshot with docs/license/citation

ccaptainpete committed 11 days ago
Unverified
97327cfd28dda4647b530a5b36ac43a97842eefb

Initial commit

ccaptainpete committed 2 months ago

README

The README file for this repository.

Earnest

A quick baby-name preference learning app for unabashed data science nerds.

  • Uses the names dataset from https://pypi.org/project/names-dataset/
  • Converts all the names to 875 wide feature frame comprised of:
    • 1: male prevalence
    • 1: female prevalence
    • 105: country prevalence
    • 768: embedding vector
  • By soliciting preferences in repeated 1-vs-20 rounds, learns a ranking
  • Displays the top and bottom 50 names

More on preferences

This is an Active Learning approach. Active Learning is useful when labelling is expensive, but can be prone to feedback loops depending on how the iterations are constructed. This app presents three columns of names, here referred to as A, B and C, from which the user is to select a single name as the most preferred for the round.

  • Column A is sampled from the current top 200 names
  • Column B is sampled from the next 800 (200:1000) names
  • Column C is sampled uniformly at random

By clicking on a name, the preference is recorded, the model retrained, and the user presented with a new set of names. There is a search box under the columns to allow selection of a name not listed (useful in bootstrapping the model), and a button filled with flowers that simply resamples using the current model.

Iterative results

After a few rounds, I've found this model learns a preference relatively quickly. One imagines the features are quite informative, specifically the embedding vector - which encodes all manner of historical, literate, and cultural associations. Results are displayed after each round.

Bias

If your name is in "the worst" list - please don't take it personally! This model is designed to uncover your preference; mine was clearly very Victorian. It's worth noting also that the nomic embedding model used here will also have a bias that reflects the data.

This project is called Earnest reflecting the preferences of Gwendolen and Cecily, who preferred that name over Jack or John - but what did this say about their preference for other names?

Running

Support

  • The code is pretty straight-forward. Best of luck!