InternLM / Awesome-LLM-Training-System
☆21Updated 5 months ago
Alternatives and similar repositories for Awesome-LLM-Training-System:
Users that are interested in Awesome-LLM-Training-System are comparing it to the libraries listed below
- ☆59Updated last month
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆47Updated 5 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆29Updated 2 months ago
- ☆51Updated 9 months ago
- Summary of some awesome work for optimizing LLM inference☆50Updated 3 weeks ago
- ☆72Updated 5 months ago
- A sparse attention kernel supporting mix sparse patterns☆94Updated 3 months ago
- ☆40Updated last month
- nnScaler: Compiling DNN models for Parallel Training☆87Updated last week
- ☆78Updated 4 months ago
- ☆80Updated 2 months ago
- Curated collection of papers in MoE model inference☆36Updated this week
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆205Updated 3 weeks ago
- PyTorch bindings for CUTLASS grouped GEMM.☆58Updated 2 months ago
- High performance Transformer implementation in C++.☆98Updated this week
- Implement Flash Attention using Cute.☆65Updated last month
- ☆54Updated last month
- PyTorch bindings for CUTLASS grouped GEMM.☆84Updated 2 weeks ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆52Updated 5 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance☆43Updated last week
- Puzzles for learning Triton, play it with minimal environment configuration!☆205Updated last month
- Quantized Attention on GPU☆34Updated last month
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆75Updated last week
- 16-fold memory access reduction with nearly no loss☆63Updated 2 months ago
- ☆48Updated 7 months ago
- ☆38Updated 7 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆234Updated last month
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆98Updated 8 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆88Updated 10 months ago
- ATC23 AE☆44Updated last year