GitXplorerGitXplorer
d

dag-data-etl-pipeline

public
1 stars
0 forks
0 issues

Commits

List of commits on branch master.
Verified
1aadc059419e374bd73ce24946307adc5f3a2793

Update README.md

ddarwinz committed 5 years ago
Verified
a9664b698e27f6567da4a8ae1b689f89e0c7cdc5

Update README.md

ddarwinz committed 5 years ago
Unverified
a6f175baf5e6009517cb98af906c12a63aa85cbe

Updated README

ddarwinz committed 5 years ago
Verified
e22288e79d6542f2421845ae408830a94ecb34ba

Update README.md

ddarwinz committed 5 years ago
Unverified
a60cc9a1e2cf28bdf84e023411d0b908136c51b0

DAG Data ETL Pipeline Exercise

ddarwinz committed 5 years ago

README

The README file for this repository.

DAG Data ETL Pipeline Exercise

Description

Lambda function that listens for a file in S3 and retrieves the data file from S3 and inserts each item in DynamoDB as JSON

The data is stored in a non-relational data store as a JSON object in the form of a Directed Acyclic Graph (DAG), similar to the following:

[
  {"name": "organism", "children": ["animal", "plant"]}, 
  {"name": "animal", "children": ["frog", "mammal"]}, 
  {"name": "frog", "children": []},
  {"name": "mammal", "children": ["dog"]},
  {"name": "dog", "children": []}, 
  {"name": "plant", "children": ["tree"]}, 
  {"name": "tree", "children": []}
]

Deployment

To deploy the solution to AWS, simply run the provided deploy.sh shell script, which will zip the files and package them as a CloudFormation artifact, and then deploy the CloudFormation stack using AWS CLI

$ ./deploy.sh

Delete the Cloudformation Stack

To delete the Cloudformation stack at any time after it has been deployed, run the provided shell script

$ ./delete-stack.sh

Assumptions

  1. Local environment is macOS Mojave version:10.14.5
  2. AWS CLI is already installed
  3. The local default profile is available with admin access to all AWS services in Oregon.

Testing

Tests can be run with unittest

$ python -m unittest test*