CentML/Mist

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CentML/Mist)

CentML / Mist

[EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

☆24

Alternatives and similar repositories for Mist

Users that are interested in Mist are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 7 months ago
oliverYoung2001 / UltraAttn
View on GitHub
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
☆16Aug 14, 2025Updated 11 months ago
xinhaoc / ferret
View on GitHub
Autonomous CUDA kernel optimization agent with structured task specs and per-config scoring
☆17Jun 17, 2026Updated last month
adsl-rg / adsl-rg.github.io
View on GitHub
☆14Jul 12, 2026Updated last week
CentML / lorafusion
View on GitHub
LoRAFusion: Efficient LoRA Fine-Tuning for LLMs
☆28Jul 2, 2026Updated 2 weeks ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
romitjain / kachua-mlsys
View on GitHub
[MLSys 26] 🥇 Solution for Gated Delta Net Track of MLSys 26 Flash infer competition
☆35May 22, 2026Updated last month
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆242Jan 20, 2026Updated 6 months ago
uccl-project / mKernel
View on GitHub
mKernel: fast multi-node, multi-GPU fused kernels
☆251Jun 21, 2026Updated 3 weeks ago
uw-syfi / vibesys
View on GitHub
Can AI Agents Build Bespoke Systems?
☆84Updated this week
sail-sg / zero-bubble-pipeline-parallelism
View on GitHub
Zero Bubble Pipeline Parallelism
☆462May 7, 2025Updated last year
kungfu-team / tenplex
View on GitHub
Dynamic resources changes for multi-dimensional parallelism training
☆31Aug 22, 2025Updated 10 months ago
TU-Berlin-DIMA / Hawk-VLDBJ
View on GitHub
☆15Dec 2, 2019Updated 6 years ago
wangrunji0408 / rjrouter
View on GitHub
[AFK] Hardware router in Chisel (THU Network Joint Lab 2020)
☆14Oct 8, 2020Updated 5 years ago
mlsys-seo / ooo-backprop
View on GitHub
☆26Dec 5, 2022Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
HydraQYH / expert_specialization_moe
View on GitHub
Expert Specialization MoE Solution based on CUTLASS
☆27Apr 14, 2026Updated 3 months ago
microsoft / SuperScaler
View on GitHub
An experimental parallel training platform
☆57Mar 25, 2024Updated 2 years ago
spcl / muliticast-based-allgather
View on GitHub
☆24Feb 12, 2025Updated last year
Yoorkin / Effektos
View on GitHub
☆10Apr 10, 2024Updated 2 years ago
hao-ai-lab / DistCA
View on GitHub
Efficient Long-context Language Model Training by Core Attention Disaggregation
☆106Apr 7, 2026Updated 3 months ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
google / iopddl
View on GitHub
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆25May 12, 2025Updated last year
HazyResearch / Megakernels
View on GitHub
Kernels, of the mega variety :)
☆780May 26, 2026Updated last month
osayamenja / FlashMoE
View on GitHub
Distributed MoE in a Single Kernel [NeurIPS '25]
☆272May 5, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
KuangjuX / NVSHMEM-Tutorial
View on GitHub
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆195Feb 11, 2026Updated 5 months ago
vdcores / vdcores
View on GitHub
Virtual Decoupled Cores: Composable Programming Framework and Runtime for Async GPUs
☆19Updated this week
coderlemon17 / LemonScripts
View on GitHub
Here is the repo for public scripts.
☆12Jul 16, 2022Updated 4 years ago
LoongServe / LoongServe
View on GitHub
☆135Nov 11, 2024Updated last year
xinhao-luo / ClusterFusion
View on GitHub
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆75Dec 11, 2025Updated 7 months ago
chengzeyi / piflux
View on GitHub
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆19Nov 18, 2024Updated last year
PluralisResearch / AsyncPP
View on GitHub
Asynchronous pipeline parallel optimization
☆23Feb 2, 2026Updated 5 months ago
tanzelin430 / libsmctrl
View on GitHub
libsmctrl论文的复现，添加了python端接口，可以在python端灵活调用接口来分配计算资源
☆12May 21, 2024Updated 2 years ago
mit-han-lab / KernelWiki
View on GitHub
☆310Jun 9, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
efeslab / fiddler
View on GitHub
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆267Nov 18, 2024Updated last year
interestingLSY / hpcgame.minesweeper
View on GitHub
☆14Jan 18, 2023Updated 3 years ago
thunlp / Seq1F1B
View on GitHub
Sequence-level 1F1B schedule for LLMs.
☆37Aug 26, 2025Updated 10 months ago
efeslab / AgentFlux
View on GitHub
☆20Dec 4, 2025Updated 7 months ago
brabbitdousha / pet_ray_tracer_inCUDA
View on GitHub
Accelerated in CUDA
☆11Oct 28, 2022Updated 3 years ago
wdlctc / mini-s
View on GitHub
☆51Oct 29, 2024Updated last year
togethercomputer / ParallelKernelBench
View on GitHub
☆41Jul 1, 2026Updated 2 weeks ago