CCL-Bench is a collaborative benchmarking project studying how distributed ML workloads behave across different hardware, frameworks, and communication libraries. As models scale, a distributed implementation on hardware A with library X can behave drastically differently from hardware B with library Y. Each group collects traces and contributes analysis tools, which are then applied cross-group — making the benchmark scalable without brute-force exploration. Metrics include MFU, estimated memory bandwidth, step time, and more.

Part of the Cornell sysphotonics research group  ·  Maintainers: Eric Ding, Kaiwen Guo, Jelena Gvero, Byungsoo Oh, Bhaskar Kataria, Atharv Sonwane, Rachee Singh  ·  Contact: Eric Ding  ·  Contributing: GitHub

workloads
🔍
🏆 Leaderboard

⇧ Upload Trace & Workload Card

① Contributor
② Model & Data
③ Hardware
④ Framework
⑤ Files & Upload
Workload name is required
Contributor name is required
Valid email is required
Model family is required
Phase is required
Batch size is required
Sequence length is required
Hardware type is required
Hardware model is required
Total count is required
Framework is required
📦
Drag & drop trace files here
or browse your computer
.nsys-rep, .json, .tar, .gz, .tgz, .zip, .bz2, .xz, .zst  ·  Max 500 MB
What happens on upload:
① Your trace files are saved to the server
② A workload_card.yaml is auto-generated from the form fields above
③ Both are stored under uploads/<workload-name>/
Uploading…
Select 1 more to compare