GitXplorerGitXplorer
k

arxiv-sanity-lite

public
1205 stars
137 forks
11 issues

Commits

List of commits on branch master.
Unverified
d7a303b410b0246fbd19087e37f1885f7ca8a9dc

add thumbnails for papers, which apparently ppl like

kkarpathy committed 3 years ago
Unverified
f980c7947a0facddcb93f58b6d8146d7d577be1e

link to arxiv-sanity-lite instead of directly to arxiv

kkarpathy committed 3 years ago
Unverified
48a7e01aa2654a7a282e020ed3a5c24a67c6281c

we will only send emails to serious users

kkarpathy committed 3 years ago
Unverified
23b0e109bf1223c154613d8764dbf53ed92edd0f

fix bug in script due to schema change of pids variable earlier

kkarpathy committed 3 years ago
Unverified
759f7e73e61ac31e50ca2212522d7439f0bb57ef

fix bug in pagination, clean up the approach a bit more

kkarpathy committed 3 years ago
Unverified
c3cb157c9f68e1e4bdcd16c68349cb89fc3bfb18

first version of pagination w00t w00t! it's a bit hacky i think, should be possible to improve this code and make it smaller and cleaner and etc.

kkarpathy committed 3 years ago

README

The README file for this repository.

arxiv-sanity-lite

A much lighter-weight arxiv-sanity from-scratch re-write. Periodically polls arxiv API for new papers. Then allows users to tag papers of interest, and recommends new papers for each tag based on SVMs over tfidf features of paper abstracts. Allows one to search, rank, sort, slice and dice these results in a pretty web UI. Lastly, arxiv-sanity-lite can send you daily emails with recommendations of new papers based on your tags. Curate your tags, track recent papers in your area, and don't miss out!

I am running a live version of this code on arxiv-sanity-lite.com.

Screenshot

To run

To run this locally I usually run the following script to update the database with any new papers. I typically schedule this via a periodic cron job:

#!/bin/bash

python3 arxiv_daemon.py --num 2000

if [ $? -eq 0 ]; then
    echo "New papers detected! Running compute.py"
    python3 compute.py
else
    echo "No new papers were added, skipping feature computation"
fi

You can see that updating the database is a matter of first downloading the new papers via the arxiv api using arxiv_daemon.py, and then running compute.py to compute the tfidf features of the papers. Finally to serve the flask server locally we'd run something like:

export FLASK_APP=serve.py; flask run

All of the database will be stored inside the data directory. Finally, if you'd like to run your own instance on the interwebs I recommend simply running the above on a Linode, e.g. I am running this code currently on the smallest "Nanode 1 GB" instance indexing about 30K papers, which costs $5/month.

(Optional) Finally, if you'd like to send periodic emails to users about new papers, see the send_emails.py script. You'll also have to pip install sendgrid. I run this script in a daily cron job.

Requirements

Install via requirements:

pip install -r requirements.txt

Todos

  • Make website mobile friendly with media queries in css etc
  • The metas table should not be a sqlitedict but a proper sqlite table, for efficiency
  • Build a reverse index to support faster search, right now we iterate through the entire database

License

MIT