GitXplorerGitXplorer
p

SoPaper

public
195 stars
43 forks
3 issues

Commits

List of commits on branch master.
Verified
2b50495d376887f46609d9dfbb6a08673434a2e6

Merge pull request #13 from zxytim/master

pppwwyyxx committed a year ago
Unverified
df328b90b18d2b797495c8f55b246b095111e2ee

Link back sopaper/__main__.py

zzxytim committed a year ago
Unverified
fad89a391dfadff64b66c8c00842169b9aa5e948

Downloadable

zzxytim committed a year ago
Unverified
ba76fcc4d59307159f5d16d482427240eea7f310

WIP: 2to3 to all files

zzxytim committed a year ago
Unverified
0246c1baeb3a863cb6415ab769f363eb86267bd6

sanitize file name (#7)

pppwwyyxx committed 7 years ago
Unverified
96dfadacc318232f918db3f843eb4fdebfc08f06

add python-magic to requirements (#8)

pppwwyyxx committed 7 years ago

README

The README file for this repository.

SoPaper, So Easy

This is a project designed for researchers to conveniently access papers they need.

The command line tool sopaper can automatically search and download paper from Internet, given the title. The downloaded paper will thus have a readable file name (I wrote it at the beginning because I'm tired of seeing the file name being random strings). It mainly supports searching papers in computer science.

How to Use

Install command line dependencies:

  • pdftk command line executable.
    • Using pdftk on OSX10.11 might lead to hangs. See here for more info.
  • poppler-utils (optional)

Install python package: pip install --user sopaper

Usage:

$ sopaper --help
$ sopaper "Distinctive image features from scale-invariant keypoints"
$ sopaper "https://arxiv.org/abs/1606.06160"

NOTE: If you are not in school, you may need proxy by environment variable http_proxy and https_proxy, to be able to download from certain sites (such as 'dl.acm.org').

Features

The searcher module will fuzzy search and analyse results in

  • Google Scholar
  • Google

and the fetcher module will further analyse the results and download papers from the following possible sources:

Searcher and Fetcher are extensible to support more websites.

The command line tool will directly download the paper with a clean filename. All downloaded paper will be compressed using ps2pdf from poppler-utils, if available.

TODO

  • Fetcher dedup: when arxiv abs/pdf apperas both in search results, page would be downloaded twice (maybe add a cache for requests)
  • Don't trust arxiv link from google scholar
  • Is title correctly updated for dlacm?
  • Extract title from bibtex -- more accurate?
  • Fetcher for other sites