GitXplorerGitXplorer
U

aws-blog-post-example

public
0 stars
0 forks
0 issues

Commits

List of commits on branch main.
Verified
cae3ec1546940c30060fcd21b94a42a2f220331c

Merge pull request #1 from Unstructured-IO/aws-example

MMKhalusova committed 3 months ago
Unverified
07c24b5c3c7ae07237b9ebc87c3fc10d6244ff86

aws blog post code example

MMKhalusova committed 3 months ago
Unverified
b5d73da9c8eeac7f511d0190553ecd62b193e479

Reverting changes

MMKhalusova committed 3 months ago
Unverified
5646047f8d11b517b813bfb50506dbcf32d9eb8a

Code example for the AWS blog post

MMKhalusova committed 3 months ago
Verified
4454a7e4c9a8d430919496a15949fb57781b651d

Initial commit

MMKhalusova committed 3 months ago

README

The README file for this repository.

aws-blog-post-example

This repository contains a script to accompany the Unstructured.io blog post in collaboration with AWS.

Link to the blog post is coming soon.

The blog post illustrates how Unstructured.io's Serverless API can transform unstructured data into a structured JSON format that can be used by RAG systems on AWS. It provides a step-by-step guide on how to use the Unstructured API, detailing each stage of the data transformation process including ingestion, partitioning, extraction, chunking, embedding with Bedrock, and syncing with OpenSearch.

To use this example:

  1. Download and install Python version 3.9.0 or later.
  2. Clone the repo, and create a virtual environment.
  3. In the new virtual environment install the required dependencies:
    • Open your terminal in the root directory of the cloned repo.
    • Run either pip install "unstructured-ingest[s3, opensearch, pdf, bedrock]" to install the latest library versions, or pip install -r requirements.txt to use specific versions as defined in the requirements.txt file.
  4. Open the run_pipeline.py, and add your values for the environment variables required to authenticate you with Unstructured Serverless API, Amazon OpenSearch, S3, and Bedrock.

You can now run the script from your terminal by executing:

python run_pipeline.py