GitXplorerGitXplorer
U

pipeline-receipts

public
2 stars
1 forks
6 issues

Commits

List of commits on branch main.
Verified
47356393b0dd0eaa1cace5704c04ba6907236533

chore: sync version-sync.sh with other repos (#12)

nnatygyoon committed 2 years ago
Verified
9e1116700ca5efa88a4c31ceb04f7ec6e238d8c2

build(deps): bump pip (#13)

nnatygyoon committed 2 years ago
Verified
d12260946b94eebd4d93b35d66a1238ae5bae52b

feat: more robust notebook check script (#11)

nnatygyoon committed 2 years ago
Verified
9a53e32a2abf18190d2b004fb5367e7b31a549b4

feat: add logging config (#7)

nnatygyoon committed 2 years ago
Unverified
1e043c76d91b2c27e3ec30ec6358f3e0b59b56d1

CI: add github action and monthly dependabot

LLaverdeS committed 2 years ago
Unverified
4a0d3e4cebe8dceef45021868168f5c89aeb5bb8

feat: publish receipts parser API

LLaverdeS committed 2 years ago

README

The README file for this repository.

https://pypi.python.org/pypi/unstructured/ https://pypi.python.org/pypi/unstructured/ https://github.com/Naereen/badges/

Pre-Processing Pipeline for Receipts

This repo implements a document pre-processing pipeline for receipts. Currently, the pipeline is under development. The pipeline assumes the receipts are in PDF or image formats (JPG, PNG).

The API is hosted at https://api.unstructured.io.

☕ Getting Started

  • Using pyenv to manage virtualenv's is recommended

    • Mac install instructions:
      • brew install pyenv-virtualenv
      • pyenv install 3.8.15

    Create a virtualenv to work in and activate it, e.g. for one named receipts:

    pyenv virtualenv 3.8.15 receipts
    pyenv activate receipts

  • Run make install

  • Start a local jupyter notebook server with make run-jupyter
    OR
    just start the fast-API locally with make run-web-app

Extracting Structured Text from an Receipt Image

After API starts, you can extract the elements of Receipt files with the command:

curl -X 'POST' \
  'http://localhost:8000/receipts/v0.1.0/receipts' \
  -F 'files=@<your_receipt_file>' \
  | jq -C . | less -R

Generating Python files from the pipeline notebooks

You can generate the FastAPI APIs from your pipeline notebooks by running make generate-api.

💂‍♂️ Security Policy

See our security policy for information on how to report security vulnerabilities.

🤗 Hugging Face

Hugging Face Spaces offer a simple way to host ML demo apps, models and datasets directly on our organization’s profile. This allows us to showcase our projects and work collaboratively with other people in the ML ecosystem. Visit our space here!

Learn more

Section Description
Company Website Unstructured.io product and company info
Fine-tuned Models and Data CORD Consolidated Receipt dataset and Donut model