GitXplorerGitXplorer
U

pipeline-document-layout

public
1 stars
1 forks
2 issues

Commits

List of commits on branch main.
Verified
cdef20ae1e9b82ec4d4100f234bb46a370f29777

chore: update all bash scripts to use shebang: /usr/bin/env bash (#33)

rryannikolaidis committed 2 years ago
Verified
7d28e2e2b3c1126ccccee5deee1f88d6cac5f494

pip version bump and Dockerfile update (#30)

nnatygyoon committed 2 years ago
Verified
0606c04fcbf38edab0a80acfa0ff863753f3035a

sync version-sync.sh with other repos (#29)

nnatygyoon committed 2 years ago
Verified
c9674d369f92bec6010a6f937df462cd82ad9d23

robust notebook check script (#28)

nnatygyoon committed 2 years ago
Unverified
c4a36dd5647ea454d854a017ba43141593488eea

Initial and repo files

committed 2 years ago
Unverified
255bb273e91ecfea8f847d99718a73f53775f2c7

Initial commit for layout pipeline

committed 2 years ago

README

The README file for this repository.

Pre-Processing Pipeline for Layout Detection

The description for the pipeline repository goes here. The API is hosted at https://api.unstructured.io.

Developer Quick Start

  • Using pyenv to manage virtualenv's is recommended

    • Mac install instructions. See here for more detailed instructions.

      • brew install pyenv-virtualenv
      • pyenv install 3.8.15
    • Linux instructions are available here.

    • Create a virtualenv to work in and activate it, e.g. for one named document_layout:

      pyenv virtualenv 3.8.15 document_layout
      pyenv activate document_layout

  • Run make install

  • Run pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.4#egg=detectron2'

  • Start a local jupyter notebook server with make run-jupyter
    OR
    just start the fast-API locally with make run-web-app

Extracting whatever from some type of document

For example:

curl -X 'POST' \
  'http://localhost:8000/document-layout/v1.0.0/layout' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'files=@sample-docs/example.png' -F 'model_type=yolox'| jq -C . | less -R

Where files includes the file to process, model_type can be 'default' (or blank) or 'yolox', also is possible to use force_ocr to auto in order to try text extraction from your file, or 'true', in which case OCR will be used.

Generating Python files from the pipeline notebooks

You can generate the FastAPI APIs from your pipeline notebooks by running make generate-api.

Security Policy

See our security policy for information on how to report security vulnerabilities.

Learn more

Section Description
Unstructured Community Github Information about Unstructured.io community projects
Unstructured Github Unstructured.io open source repositories
Company Website Unstructured.io product and company info