GitXplorerGitXplorer
n

flash-pix2struct-azureml

public
2 stars
0 forks
0 issues

Commits

List of commits on branch main.
Unverified
ac6f3af0043de0ae3fee3080e689a7486fdce3dc

format with black

nnbroad1881 committed 2 years ago
Unverified
63f509e6bb899de24a29f2fd3c8ac830875aa90e

add docstrings

nnbroad1881 committed 2 years ago
Unverified
9b6fa797da93f16757d7133dec732e0eb010df54

add readme, var for config.json

nnbroad1881 committed 2 years ago
Unverified
0ccd600994da74a8106baf603e5ecd12aa744d98

first commit

nnbroad1881 committed 2 years ago
Verified
ee404f407115cc62e90eedb764e20b0459d4cc44

Initial commit

nnbroad1881 committed 2 years ago

README

The README file for this repository.

flash-pix2struct-azureml

This repo contains the code to run pix2struct in Azure ML using flash attention. There is only flash attention in the encoder, because the decoder has an attention bias mechanism that isn't compatible. Flash attention can help save memory when training with a large number of patches (2k+).

The existing code uses a public dataset, so you'll likely want to point it to a local directory of files. I do not do any processing on the dataset: the model will try to generate json code. Some other approaches will add special tokens that delimit the generated text to make it easier to parse.

This uses the PyTorch NGC Container from April of 2023 (nvcr.io/nvidia/pytorch:23.04-py3). To use prompts for the model, the following was added to the Dockerfile.

RUN apt-get update && apt-get -y install libfreetype6-dev
RUN pip uninstall -y pillow && \
    pip install --no-cache-dir pillow

The default font file is also included.

Running this is a simple as executing each cell in the notebook. Everything else should be self-explanatory, but if you have a question please open an issue.