G

Benefault

public

2 stars

1 forks

0 issues

Commits

List of commits on branch master.

Verified

901d915007184db64277f414df520150313a51d7

Update README.md

GGuanhuaWang committed 7 years ago

Verified

fc0e327df5a5f2e4203c75ab749ab805c8b79e2b

Update JCudaDFVectorAdd.scala

GGuanhuaWang committed 7 years ago

Verified

e758578beeef8fa098f47211b976a160512688f2

Update PhaseCount.scala

GGuanhuaWang committed 7 years ago

Verified

b5fa8e4c36c8378bd05b9ebb0eb47a553a9d732e

Update JCudaDFVectorAdd.scala

GGuanhuaWang committed 7 years ago

Unverified

2d907d5535b80affc273de5a5adb6300791e3f21

Update wordapp.scala

GGuanhuaWang committed 7 years ago

Unverified

56a3ac89b5adc83ec7d20cbab966ecc9deee49af

Update sort.sbt

GGuanhuaWang committed 8 years ago

README

The README file for this repository.

Benefault

A way for task preemption in Big data analytics platform

What we have done

a simple shell script for monitoring node's metadata (e.g. disk access, network Tx Rx etc) in a cluster
read and write for chekcpointing data (note: checkpointRead is private in spark, we need to package function into org.apache.spark)

We have already done some simulation about the JCT gain we can get using Benefault

The performance gain is 15-30%

We test latency in varied scenarios

Measure checkpoint latency using Spark
Word Count with checkpointing
Sorting with checkpoint
GroupByKey with Checkpointing
DecisionTree with periodic Checkpointing
We now design schemes for evaluate best gain we can get using Benefault
find sweet spot for whether kill or preempt