GitXplorerGitXplorer
k

covid-sanity

public
361 stars
48 forks
3 issues

Commits

List of commits on branch master.
Unverified
c8a2ab52b2c7ce04097ad7200de5785e796896cd

wow biorxiv changed their api in non-backwards compatible way and broke this website, they changed rel_authors to be a list of dicts instead of just a simple string. that's not cool. making a hot fix. it's a bit gross but will work for now

kkarpathy committed 5 years ago
Unverified
8ce4543c3457174af8ced432f96b8a92fe47c809

quick workaround to make sure serve.py doesnt read a halfway written json file

kkarpathy committed 5 years ago
Unverified
bc717f26f528321cc3788061d15f2fd133fa3262

adding more bot accounts. i have to find some way to make this scalable and not require direct curation and commits

kkarpathy committed 5 years ago
Unverified
c6d566a66cea682714df9d3306edec713a44de6e

split out list of banned accounts to an external text file instead of in code

kkarpathy committed 5 years ago
Unverified
6ed549605261382f63c0db0003fe86b7754a70f4

refactor twitter daemon: be safer, and re-process most recent tweets more often

kkarpathy committed 5 years ago
Unverified
99d99b9bc2d992e6523fde4cca3ceedbfb058534

Merge branch 'yukosgiti-master'

kkarpathy committed 5 years ago

README

The README file for this repository.

covid-sanity

This project organizes COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv. The raw data comes from the bioRxiv page, but this project makes the data searchable, sortable, etc. The "most similar" search uses an exemplar SVM trained on tfidf feature vectors from the abstracts of these papers. The project is running live on biomed-sanity.com. (I could not register covid-sanity.com because the term is "protected")

user interface

Since I can't assess the quality of the similarity search I welcome any opinions on some of the hyperparameters. For instance, the parameter C in the SVM training and the size of the feature vector max_features (currently set at 2,000) dramatically impact the results.

This project follows a previous one of mine in spirit, arxiv-sanity.

dev

As this is a flask app running it locallyon your own computer is relatively straight forward. First compute the database with run.py and then serve:

$ pip install -r requirements.txt
$ python run.py
$ export FLASK_APP=serve.py
$ flask run

prod

To deploy in production I recommend NGINX and Gunicorn. Linode is one easy/cheap way to host the application on the internet and they have detailed tutorials one can follow to set this up.

I run the server in a screen session and have a very simple script pull.sh that updates the database:

#!/bin/bash

# print time
now=$(TZ=":US/Pacific" date)
echo "Time: $now"
# active directory
cd /root/covid-sanity
# pull the latest papers
python run.py
# restart the gracefully
ps aux |grep gunicorn |grep app | awk '{ print $2 }' |xargs kill -HUP

And in my crontab -l I make sure this runs every 1 hour, for example:

# m h  dom mon dow   command
3 * * * * /root/covid-sanity/pull.sh > /root/cron.log 2>&1

seeing tweets

Seeing the tweets for each paper is purely optional. To achieve this you need to follow the instructions on setting up python-twitter API and then write your secrets into a file twitter.txt, which get loaded in twitter_daemon.py. I run this daemon process in a screen session where it pulls tweets for every paper in circles and saves the results.

License

MIT