GitXplorerGitXplorer
t

kubeflow-cern

public
3 stars
1 forks
0 issues

Commits

List of commits on branch master.
Unverified
886925fad5c37a72f6999c1100584fa8e4a0adae

Add link to Kaggle competition

ttexasmichelle committed 6 years ago
Unverified
1dfcaea797c4fba22dcca08f3675ebff8bfc96d7

Add zone to cluster update command

ttexasmichelle committed 6 years ago
Unverified
454dbefbec92ae48882e7da86ec627c935a39605

Cleanup statics in pipeline files

ttexasmichelle committed 6 years ago
Unverified
77f4e7ace484357d6976dad97aaf631c2bb85e81

Write results file to GCS bucket

ttexasmichelle committed 6 years ago
Unverified
ab4919a05d46171a1cc4dfb1447c24a018734ba2

Add pipeline notebook

ttexasmichelle committed 6 years ago
Unverified
91fec09ec9e4ed8c9d6f394f358a17fce40eeaf5

Add GPU to training

ttexasmichelle committed 6 years ago

README

The README file for this repository.

TrackML at CERN

Author: Michelle Casbon

An example of LHC data on Kubeflow based on the Kaggle TrackML Particle Tracking Challenge.

  1. Create a cluster and install Kubeflow
  2. Run a notebook
  3. Run a pipeline
  4. Run hyperparameter tuning

Create a cluster and install Kubeflow

Create a cluster with Click-to-deploy using default settings. Follow the provided instructions to setup OAuth credentials.

After the cluster is available and you are able to access the Kubeflow central dashboard, enable auto-provisioning with the following command:

gcloud beta container clusters update kubeflow \
  --zone europe-west1-b \
  --enable-autoprovisioning \
  --max-cpu 128 \
  --max-memory 1120 \
  --max-accelerator type=nvidia-tesla-k80,count=4 \
  --verbosity error

Run a notebook

From the Kubeflow central dashboard, click on Notebooks and spawn a new instance. Use all defaults except for the following parameters:

CPU: 2

Memory: 12.0Gi

When the notebook instance is ready, click Connect and open a new Terminal. Run this command to import necessary libraries:

git clone https://github.com/LAL/trackml-library.git src/trackml-library
pip install src/trackml-library
pip install pandas
pip install matplotlib
pip install seaborn

Download sample data with this command:

mkdir input
gsutil cp gs://chasm-data/kaggle/trackml-particle-identification/train_sample.zip input
cd input
unzip train_sample.zip

Upload the file notebooks/trackml-problem-explanation-and-data-exploration.ipynb, which was adapted from Wesam Elshamy's Kaggle Kernel for use on Kubeflow v0.5.0, and open the notebook.

Run a pipeline

Build docker images

Each step in a pipeline references a container image. Build the necessary docker images with these commands:

docker/build.sh kfp_kubectl
docker/build.sh trackml

Compile the pipeline

In a local Terminal or Cloud Shell, install the Kubeflow pipelines python SDK by running this command:

pip install -U kfp

Compile a pipeline by running it directly:

curl -O https://raw.githubusercontent.com/texasmichelle/kubeflow-cern/master/pipelines/trackml.py
./trackml.py

Upload and run the pipeline

From the Kubeflow central dashboard, click on Pipeline Dashboard, then Upload pipeline. Select the file you just created (trackml.py.tar.gz) and then Upload.

Run the pipeline by first creating an experiment, then a run.

Run the pipeline from a notebook

From the Kubeflow central dashboard, click on Notebooks, then Upload the file notebooks/trackml-pipeline.ipynb.

Run the notebook and click on the resulting links to view the pipeline executing.

Run once more, requesting GPU resources and watch auto-provisioning add a GPU node to the cluster before executing training.

Run hyperparameter tuning

Run the gpu-example on the cluster with this command:

kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha1/gpu-example.yaml

Observe auto-provisioning spin up 2 extra GPU nodes (5 total: 2 CPU, 3 GPU).