Sediment API

Welcome to the Sediment API repository! I wrote the code in this repository as part of an exercise. The exercise prompt was, in summary, to (a) write a Python script someone could use to extract data records from a CSV file and store them in a MongoDB database; and (b) create an HTTP API that someone could use to retrieve a data record in JSON format.

Sediment API

Overview

This repository contains Python scripts people can use to extract data from CSV files, store that data in a MongoDB database, and provide access to that data via an HTTP API.

The scripts are:

parser/parser.py: A file parser people can use to extract data from a CSV file and insert it into a database
server/server.py: A web server people can use to provide access to that data via an HTTP API

Here's a diagram showing how data flows into, between, and out of those scripts.

%% This is a flowchart written using Mermaid syntax.
%% GitHub will render it as an image.
%%
%% References: 
%% - https://mermaid.js.org/syntax/flowchart.html
%% - https://github.blog/2022-02-14-include-diagrams-markdown-files-mermaid/

flowchart LR
    parser[[parser.py]]
    db[(Database)]
    file[CSV File]
    client[HTTP Client]
    server[[server.py]]

    parser --> db
    db --> server

    subgraph File Parser
        parser
    end

    file -. CSV .-> parser

    subgraph Web Server
        server
    end

    server -. JSON .-> client

Usage

Exercise-specific

Here's how you can produce the behavior described in the exercise prompt.

Install Docker onto your computer.
Clone (or download and extract) this repository onto your computer.
Open a console in the root folder of the repository.
Copy the example config file and name the copy ".env".
```
cp .env.example .env
```
Start the web server and MongoDB (in Docker containers).
```
docker-compose up
```
Note: That command will run the containers in the foreground, taking over your console. You can open a new console to issue the remaining commands.

Run the parser (in the app container).

docker exec -it app python parser/parser.py parser/example_data/WHONDRS_S19S_Sediment_GrainSize.csv

In a web browser, visit http://localhost:8000/samples/S19S_0001_BULK-D
- The web browser will show a sample in JSON format.

General

Here's how you can use the system in general.

Do steps 1-5 shown in the "Exercise-specific" section above.
(Optional) Put a CSV file you want to parse, anywhere within the repository's file tree.

Note: All files within the repository's file tree are accessible within the app container (within the app container, the root folder of the repository is located at /code).
Run the parser, specifying the path to the CSV file you want to parse.
```
# Specify the path as it would be specified within the `app` container.
docker exec -it app python parser/parser.py <path_to_csv_file>
```
Note: You can specify the path as either an absolute path, using /code to refer to the root folder of the repository (e.g. /code/path/to/file.csv); or a relative path, relative to the root folder of the repository (e.g. ./path/to/file.csv).
Submit an HTTP GET request to a URL having the format: http://localhost:8000/samples/<sample_id>
(Optional) Visit the interactive API documentation at http://localhost:8000/docs

You can also run tests, perform static type checking, and format the code. Instructions for doing those things are in the "Development" section below.

Development

Note: You can issue all the commands shown in this section from the root folder of the repository.

Environment

This repository contains a Docker-based development environment.

You can configure the development environment (and the Python scripts) by copying the .env.example file and naming it .env.

cp .env.example .env

Note: The default values in .env.example are adequate for running the Python scripts in the development environment.

You can then instantiate the development environment by issuing the following command:

docker-compose up

# Or, if you've made changes to the Dockerfile or to `requirements.txt`:
docker-compose up --build

Note: That will cause Docker to instantiate a container for each service described in docker-compose.yml.

The mongo container will automatically start running MongoDB.

The app container, which has all the Python scripts' dependencies installed, will automatically start running the web server.

With the development environment up and running, you can access a bash shell running on the app container by issuing the following command:

docker exec -it app bash

Testing

The tests in this repository were written using pytest, a Python test framework and test runner.

With the development environment up and running, you can run all the tests in the repository by issuing the following command:

# From the `app` container:
pytest

# Or, from the Docker host:
docker exec -it app pytest

Note: You can invoke pytest with the -v option to see a list of the tests that were run.

In addition, you can use the tool, coverage, to measure code coverage while running the tests—and to subsequently display a code coverage report—by issuing the following command:

# From the `app` container:
coverage run -m pytest && coverage report

# Or, from the Docker host:
docker exec -it app bash -c "coverage run -m pytest && coverage report"

Note: You can invoke coverage report with the -m option (as in, coverage report -m) to see which lines of code were "missed" (i.e. not executed).

Static type checking

You can use mypy to perform static type checking on the Python code in this repository.

With the development environment up and running, you can perform static type checking by issuing the following command:

# From the `app` container:
mypy

# Or, from the Docker host:
docker exec -it app mypy

Note: When you run mypy as shown above, it will run according to the configuration specified in mypy.ini.

Code formatting

The Python code in this repository is formatted using Black, which is an "opinionated"—but still PEP 8-compliant—code formatter.

With the development environment up and running, you can format all the Python code in the repository by issuing the following command:

# From the `app` container:
black .

# Or, from the Docker host:
docker exec -it app black .

Dependencies

I wrote the Python scripts in this repository using Python 3.10.

The requirements.txt file contains a list of all the dependencies of the Python scripts in this repository. I generated the file by issuing the following command:

# From the `app` container:
pip freeze > requirements.txt

# Or, from the Docker host:
docker exec -it app pip freeze > requirements.txt

The table below contains the names of all the packages I explicitly installed via pip install <name>:

Name	Description	I use it to...	References
`black`	Code formatter	Format Python code	Documentation
`coverage`	Code coverage measurement tool	Measure test coverage	Documentation
`fastapi`	HTTP API framework	Process HTTP requests	Documentation
`httpx`	HTTP client	Submit HTTP requests (in tests)	Documentation
`mypy`	Static type checker	Verify data type consistency	Documentation
`pymongo`	Synchronous MongoDB driver	Interact with the database	Documentation
`pytest`	Test framework	Run the tests	Documentation
`python-dotenv`	Configuration loader	Read the `.env` file	Documentation
`typer[all]`	CLI framework	Process CLI input and output	Documentation
`uvicorn[standard]`	ASGI web server	Serve the FastAPI app	Documentation

Note: Packages listed in requirements.txt that are not listed above, are packages that were automatically installed by pip when I installed the packages listed above. In other words, they are "dependencies of dependencies" (i.e. dependencies of the packages listed above).

Roadmap

Here are some additional things I'm thinking about doing in this repository:

Create a Pydantic model representing the "sample" object and use it to (a) validate and sanitize the data extracted from the CSV file (e.g. "-9999" → None); (b) display the API response's JSON schema in the API docs and (c) filter out the _id field from the API response. Item (a) would happen in parser.py and items (b) and (c) would happen in server.py. Items (a) and (c) are already happening, but not via a Pydantic model.

sediment-api

Commits

Update server test to create same MongoDB index as parser creates

Make `Sample_ID` index "unique" and remove compound index

Clarify comment in code

Fix typos and formatting and simplify phrasing

Fix inaccurate return type and corresponding docstring

Add type hints to "fixture" dependencies (parameters) of test functions

README