Pre-Processing Pipeline for Layout Detection
The description for the pipeline repository goes here.
The API is hosted at https://api.unstructured.io
.
-
Using
pyenv
to manage virtualenv's is recommended-
Mac install instructions. See here for more detailed instructions.
brew install pyenv-virtualenv
pyenv install 3.8.15
-
Linux instructions are available here.
-
Create a virtualenv to work in and activate it, e.g. for one named
document_layout
:pyenv virtualenv 3.8.15 document_layout
pyenv activate document_layout
-
-
Run
make install
-
Run
pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.4#egg=detectron2'
-
Start a local jupyter notebook server with
make run-jupyter
OR
just start the fast-API locally withmake run-web-app
For example:
curl -X 'POST' \
'http://localhost:8000/document-layout/v1.0.0/layout' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'files=@sample-docs/example.png' -F 'model_type=yolox'| jq -C . | less -R
Where files
includes the file to process, model_type
can be 'default' (or blank) or 'yolox',
also is possible to use force_ocr
to auto in order to try text extraction from your file, or
'true', in which case OCR will be used.
You can generate the FastAPI APIs from your pipeline notebooks by running make generate-api
.
See our security policy for information on how to report security vulnerabilities.
Section | Description |
---|---|
Unstructured Community Github | Information about Unstructured.io community projects |
Unstructured Github | Unstructured.io open source repositories |
Company Website | Unstructured.io product and company info |