InternLM / Awesome-LLM-Training-SystemLinks

☆44

Alternatives and similar repositories for Awesome-LLM-Training-System

Users that are interested in Awesome-LLM-Training-System are comparing it to the libraries listed below

Sorting:

d-matrix-ai / keyformer-llm
☆58Updated last year
kwai / Megatron-Kwai
LLM training technologies developed by kwai
☆66Updated last week
madsys-dev / deepseekv2-profile
☆153Updated 9 months ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆132Updated last month
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆356Updated 4 months ago
microsoft / chunk-attention
☆82Updated 7 months ago
InternLM / turbomind
☆97Updated 8 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆282Updated 9 months ago
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆108Updated 8 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆190Updated last month
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆142Updated 10 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆120Updated 2 months ago
AlibabaPAI / FLASHNN
☆102Updated last year
LoongServe / LoongServe
☆124Updated last year
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆132Updated 6 months ago
thu-pacman / SmartMoE-AE
ATC23 AE
☆47Updated 2 years ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆170Updated last month
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆119Updated last year
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆82Updated this week
CalvinXKY / mfu_calculation
A simple calculation for LLM MFU.
☆50Updated 2 months ago
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆294Updated 5 months ago
thu-pacman / FasterMoE
☆88Updated 3 years ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆227Updated 2 years ago
AniZpZ / AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
☆109Updated 8 months ago
cat538 / SKVQ
[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
☆24Updated last year
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆331Updated last year
flashinfer-ai / cutlass-viz
☆65Updated 7 months ago
Equationliu / Kangaroo
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆63Updated last year
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆65Updated last year
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆97Updated 11 months ago