InternLM / Awesome-LLM-Training-SystemLinks
☆45Updated last year
Alternatives and similar repositories for Awesome-LLM-Training-System
Users that are interested in Awesome-LLM-Training-System are comparing it to the libraries listed below
Sorting:
- ☆153Updated 9 months ago
- ☆58Updated last year
- LLM training technologies developed by kwai☆67Updated last month
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆139Updated this week
- nnScaler: Compiling DNN models for Parallel Training☆121Updated 3 months ago
- ☆97Updated 9 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆360Updated 5 months ago
- ☆84Updated 8 months ago
- A simple calculation for LLM MFU.☆51Updated 3 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆87Updated this week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆230Updated 2 years ago
- Sequence-level 1F1B schedule for LLMs.☆38Updated 4 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆177Updated last week
- A lightweight design for computation-communication overlap.☆200Updated 2 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
- Nex Venus Communication Library☆68Updated last month
- ☆88Updated 3 years ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆282Updated 9 months ago
- 16-fold memory access reduction with nearly no loss☆109Updated 9 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆135Updated 6 months ago
- ☆104Updated last year
- ☆65Updated 8 months ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆73Updated 3 months ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆300Updated 6 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆70Updated 3 weeks ago
- Implement Flash Attention using Cute.☆98Updated last year
- ☆126Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆331Updated last year
- Allow torch tensor memory to be released and resumed later☆191Updated 3 weeks ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆66Updated last year