[EuroSys'25] Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
☆22Feb 5, 2026Updated last month
Alternatives and similar repositories for Mist
Users that are interested in Mist are comparing it to the libraries listed below
Sorting:
- ☆26Dec 5, 2022Updated 3 years ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated last year
- Official implementation of Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.☆14Nov 13, 2025Updated 4 months ago
- Confidence Regulation Neurons in Language Models (NeurIPS 2024)☆15Feb 1, 2025Updated last year
- 校园疫情防空系统前 端☆14Dec 3, 2022Updated 3 years ago
- Slowdown prediction module of Echo: Simulating Distributed Training at Scale☆13May 17, 2025Updated 10 months ago
- The Zaychik Power Controller server☆13Apr 13, 2024Updated last year
- [ICLR 2025] RaSA: Rank-Sharing Low-Rank Adaptation☆10May 19, 2025Updated 10 months ago
- My Interview recording repo.☆11Mar 22, 2023Updated 3 years ago
- Pie: Programmable LLM Serving☆131Updated this week
- Memory footprint reduction for transformer models☆11Jan 24, 2023Updated 3 years ago
- A small RISC-V kernel coding by C, tested on sifive unmatched board.☆16Aug 20, 2022Updated 3 years ago
- 一些有趣的页面,使用 Github Pages 和 Vercel 部署☆13Feb 8, 2024Updated 2 years ago
- BUAA-数据库大作业-django+mysql+vue电商项目,包含用户端商家端管理员端☆11Dec 27, 2022Updated 3 years ago
- Code for "Practical Low-Rank Communication Compression in Decentralized Deep Learning"☆17Aug 4, 2020Updated 5 years ago
- ☆18Oct 15, 2020Updated 5 years ago
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆16Jan 12, 2024Updated 2 years ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆261Nov 18, 2024Updated last year
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆16Oct 20, 2021Updated 4 years ago
- 强化学习课程,主要是如何用强化学习解决问题☆15Dec 10, 2024Updated last year
- How to plot for papers, slides, demos, etc.☆10Apr 7, 2022Updated 3 years ago
- An experimental parallel training platform☆56Mar 25, 2024Updated last year
- ☆13Nov 2, 2022Updated 3 years ago
- ☆29Nov 2, 2022Updated 3 years ago
- ☆14Mar 3, 2026Updated 2 weeks ago
- 北航计算机网络个人学习笔记☆15Nov 10, 2020Updated 5 years ago
- kernels, of the mega variety☆690Updated this week
- RapidIn: Scalable Influence Estimation for Large Language Models (LLMs). The implementation for paper "Token-wise Influential Training Da…☆21Mar 10, 2026Updated last week
- LLM training technologies developed by kwai☆71Jan 21, 2026Updated 2 months ago
- Shared library for intercepting CUDA Runtime API calls. This was part of my Bachelor thesis: A Study on the Computational Exploitation of…☆14Jun 6, 2024Updated last year
- A simple implementation of ReasonGenRM.☆19Apr 21, 2025Updated 11 months ago
- ☆11Mar 3, 2024Updated 2 years ago
- ☆13Mar 6, 2023Updated 3 years ago
- Repositorio para estudiar para el final de Algoritmos 3☆15Oct 23, 2018Updated 7 years ago
- ☆12Aug 26, 2025Updated 6 months ago
- 方便扩展的Cuda算子理解和优化框架,仅用在学习使用☆18Jun 13, 2024Updated last year
- Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.☆57Mar 4, 2026Updated 2 weeks ago
- Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, RAG, and Agentic AI.☆65Updated this week
- [ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron☆30Apr 30, 2025Updated 10 months ago