The aim of the small POC is to create an end to end machine learning pipeline for classifying the Fashion MNIST images using DVC (Data version control) framework and then deploy the whole ML Pipeline using Github actions as CI-CD pipeline. Additional goal is to get an understanding of how ML pipelines work using state of the art framework like DVC and view the results of the model on DVC studio. This POC can be further extended to deploy the ML pipeline on Amazon AWS.
Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Zalando intends Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.
- Create a conda environment using VSCode first in your respective directory/ you can clone this repository itself.
conda create --prefix ./env python=3.7 -y
- Activate the conda environment
conda activate ./env
OR
source activate ./env
- install the requirements
pip install -r requirements.txt
- initialize the dvc project
dvc init
- Run the ML pipeline using the command
dvc repro
-
View the ML pipeline setup using the command
dvc dag
Note :
- dvc needs to be installed first before running dvc repro. dvc can be installed using
pip install dvc
- Experiment results can be viewed in the Interactive studio using the link (https://studio.iterative.ai)
- Using Continous Machine Learning (CML) CI-CD pipelines can be created in the github (https://github.com/iterative/cml#getting-started)