GitXplorerGitXplorer
x

pdf-to-csv-ruby

public
3 stars
1 forks
0 issues

Commits

List of commits on branch master.
Unverified
87054b9e82b84b4830f3be3467f1270b7c0e6ac5

Basic documentation with example

xxavriley committed 9 years ago
Unverified
85f8f78d9d7899eb50896441a841192a7d905944

Add README

xxavriley committed 9 years ago
Unverified
58e090e9ccd8481d86d19a9e8bcabca4f95fac4e

First draft

xxavriley committed 9 years ago
Unverified
8d4cefba2c8c0234fc06394f80ae1b4f59672456

Initial commit

xxavriley committed 9 years ago

README

The README file for this repository.

Parsing tables from image based PDFs with open source tools

$ brew install poppler
$ brew install tesseract --HEAD
$ brew install imagemagick --with-fftw
$ brew install gocr --with-lib --with-netpbm

To run

$ pdfimages -png aviva_plc_annual_return_2014.pdf /tmp/out
$ cp /tmp/out-037.png .
$ ruby ocr.rb out-037.png