GitXplorerGitXplorer
t

kubeflow-cern

public
3 stars
1 forks
0 issues

Commits

List of commits on branch master.
Unverified
1270ad243981efd7d7e40e25b1f1b5d675178495

Add katib example

ttexasmichelle committed 6 years ago
Unverified
adfd3b7aef4f52b7a661d2dfd7d7d049757ff72b

Add trackml image

ttexasmichelle committed 6 years ago
Unverified
f8fa2022614bbe8075f6f1f7b817301e1311810d

Add Apache2 license

ttexasmichelle committed 6 years ago
Unverified
b80417b96432dafa791f03b9cf2a348f210fac11

Initial commit

ttexasmichelle committed 6 years ago

README

The README file for this repository.

TrackML at CERN

Author: Michelle Casbon

An example of LHC data on Kubeflow based on the Kaggle TrackML Particle Tracking Challenge.

  1. Create a cluster and install Kubeflow
  2. Run a notebook
  3. Run a pipeline
  4. Run hyperparameter tuning

Create a cluster and install Kubeflow

Create a cluster with Click-to-deploy using default settings. Follow the provided instructions to setup OAuth credentials.

After the cluster is available and you are able to access the Kubeflow central dashboard, enable auto-provisioning with the following command:

gcloud beta container clusters update kubeflow \
  --zone europe-west1-b \
  --enable-autoprovisioning \
  --max-cpu 128 \
  --max-memory 1120 \
  --max-accelerator type=nvidia-tesla-k80,count=4 \
  --verbosity error

Run a notebook

From the Kubeflow central dashboard, click on Notebooks and spawn a new instance. Use all defaults except for the following parameters:

CPU: 2

Memory: 12.0Gi

When the notebook instance is ready, click Connect and open a new Terminal. Run this command to import necessary libraries:

git clone https://github.com/LAL/trackml-library.git src/trackml-library
pip install src/trackml-library
pip install pandas
pip install matplotlib
pip install seaborn

Download sample data with this command:

mkdir input
gsutil cp gs://chasm-data/kaggle/trackml-particle-identification/train_sample.zip input
cd input
unzip train_sample.zip

Upload the file notebooks/trackml-problem-explanation-and-data-exploration.ipynb, which was adapted from Wesam Elshamy's Kaggle Kernel for use on Kubeflow v0.5.0, and open the notebook.

Run a pipeline

Build docker images

Each step in a pipeline references a container image. Build the necessary docker images with these commands:

docker/build.sh kfp_kubectl
docker/build.sh trackml

Compile the pipeline

In a local Terminal or Cloud Shell, install the Kubeflow pipelines python SDK by running this command:

pip install -U kfp

Compile a pipeline by running it directly:

curl -O https://raw.githubusercontent.com/texasmichelle/kubeflow-cern/master/pipelines/trackml.py
./trackml.py

Upload and run the pipeline

From the Kubeflow central dashboard, click on Pipeline Dashboard, then Upload pipeline. Select the file you just created (trackml.py.tar.gz) and then Upload.

Run the pipeline by first creating an experiment, then a run.

Run the pipeline from a notebook

From the Kubeflow central dashboard, click on Notebooks, then Upload the file notebooks/trackml-pipeline.ipynb.

Run the notebook and click on the resulting links to view the pipeline executing.

Run once more, requesting GPU resources and watch auto-provisioning add a GPU node to the cluster before executing training.

Run hyperparameter tuning

Run the gpu-example on the cluster with this command:

kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha1/gpu-example.yaml

Observe auto-provisioning spin up 2 extra GPU nodes (5 total: 2 CPU, 3 GPU).