GitXplorerGitXplorer
l

cryptomonitor

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Verified
331e7ece7e5cab02b4bcd90ae592b22eb74b4a3b

Update README.md

llukemaxwell committed 2 years ago
Unverified
75be0ab0a4aaf49ec6196e4d75b8854079aefe03

Merge branch 'master' of github.com:lukemaxwell/cryptomonitor

llukemaxwell committed 2 years ago
Unverified
695667acc4a74f7131a6a03d967a362959bb1df9

Remove stale code

llukemaxwell committed 2 years ago
Verified
be6ebf28907560b047f84e22548a27e6b3ca9902

Update README.md

llukemaxwell committed 2 years ago
Verified
fd213181fe4fca73c22a98814a1593fb783b33ea

Update README.md

llukemaxwell committed 2 years ago
Verified
22afebe8d52439cc29aca588b459e358a7648726

Update README.md

llukemaxwell committed 2 years ago

README

The README file for this repository.

Cryptomonitor

Cryptomonitor is a demo application designed to monitor crypto news feeds in real time and store articles that match configurable rules.

The application is built using FastAPI and SQLAlchemy (async). It checks for new feed content every 10 seconds using a FastAPI background task.

If articles must be fetched from the source website they are queued in the database as ArticleJobs.

Article jobs are checked every 10 seconds using a seperate background task and articles are fetched in rate limited fashion (no more than 1 request per host every 5 seconds).

Feeds and rules can be configured using the API, and article body is printed to stdout and the json format can be retrieved using the articles API endpoint, or via the websocket.

The article job queue can also be viewed via the API.

Improvements

Architecture

This is small self-contained application used for the purposes of demonstration, and to see what can be done with FastAPI background tasks. However, doing this volume of background work in the service that is running the api is unwise.

There are various options for improvement which were out of scope for this example.

  1. Run the feed and article tasks in separate containers. There is an example of this in the docker-compose file. It is a very small improvement and would break the websocket (as the websocket listener is run by the API). An easy fix for this would be to have the article task use the API endpoint to create articles. Slightly more robust, would work with ECS/EKS.

  2. Use proper background workers for the task collection, e.g. celery. Much more rebust and better for ongoing collection and debug. At this point the benefit of asynchrounous web request would begin to diminish (or simply add unwanted debug complexity).

  3. Use a hybrid serverless solution where feed and article tasks are offloaded to lambdas for parallel procesing. Similar benefits to above with opportunity for greater parallelism.

  4. Use a fully serverless approach using http gateway, api gateway, SNS/SQS queues and triggers. Scales to large volume of data collection and allows full range of aws services, however would require significant changes to the code.

Usage

$ cd infrastructure
$ docker-compose up -d
$ docker-compose logs -f api

Notes

Needs a lot more error handling.

Needs tests.

There is no housekeeping of the article job queue. If there are issues or the db is closed while tasks are running, the article jobs may be stuck in processing state and the task will no longer collect. A cleanup task could be created to take care of stale article jobs.

There are no db migrations (e.g. with Alembic).

The collection of articles is inefficient. Async parallelism could be maximised by using window query to get pending jobs. See here.

There is a runtime warning from aiohttp visible in the logs. This appears to be a known issue with aiohttp https://github.com/aio-libs/aiohttp/issues/4282

Probabably a lot of undiscovered bugs (this was written in a short space of time).