GitXplorerGitXplorer
s

hoplite

public
43 stars
6 forks
5 issues

Commits

List of commits on branch master.
Verified
946bd6f04eb6cfd24810d7d78d38b0d5e00a4c6f

Add signal.h in src/client/object_sender.cc to resolve the build error (#167)

llambda7xx committed 3 years ago
Verified
9993d439745a196800cdcabcf505504c9dc16f64

update RLLib instructions (#166)

ssuquark committed 4 years ago
Verified
68cdae20034d0b534f2815ff44bd85310c519b41

Update README.md

zzhuohan123 committed 4 years ago
Verified
23e9e0730df04906ae1d561702392a40ea37feda

Simplify evaluation (#165)

ssuquark committed 4 years ago
Unverified
4e3b54174f002cc57e84dd732d7eda4655fcf960

Fix documentation

ssuquark committed 4 years ago
Verified
d37f3f9e63172baf3f9671f2895ae0c1529f4b94

Plot async SGD (#164)

ssuquark committed 4 years ago

README

The README file for this repository.

Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems

This is the repo for the artifact evaluataion for the SIGCOMM 2021 paper: Hoplite: Efficient and Fault-Tolerant Collective Communication for Task-Based Distributed Systems. For any questions or related issue, please feel free to contact Siyuan Zhuang (s.z@berkeley.edu) and Zhuohan Li (zhuohan@berkeley.edu).

Setup AWS Cluster & Hoplite

All the experiments in the paper are evaluated on AWS. We use Ray cluster launcher to lanuch the cluster for all the experiments in the paper. We highly recommend using Ray cluster launcher to launch the cluster as it will automatically setup the execution environment we required in the experiments.

For every experiment, we include detailed instruction for setting up a cluster and reproducing the results in the paper.

Microbenchmarks (Section 5.1)

Please see microbenchmarks/ to reproduce the microbenchmark experiments in the paper.

Asynchronous SGD (Section 5.2)

Please see app/parameter-server/ to reproduce the Asynchronous SGD experiments in the paper.

Reinforcement Learning (Section 5.3)

Please see app/rllib/ to reproduce the rllib experiments in the paper.

ML Model Serving Experiments (Section 5.4)

Please see app/ray_serve/ to reproduce the Ray serve experiments and the Ray serve fault tolerance experiments (Section 5.5, Figure 12a) in the paper.