fanshiqing / grouped_gemmLinks
PyTorch bindings for CUTLASS grouped GEMM.
☆151Updated last month
Alternatives and similar repositories for grouped_gemm
Users that are interested in grouped_gemm are comparing it to the libraries listed below
Sorting:
- A collection of memory efficient attention operators implemented in the Triton language.☆279Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.☆121Updated 4 months ago
- Zero Bubble Pipeline Parallelism☆427Updated 4 months ago
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆65Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆216Updated last year
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆98Updated 2 weeks ago
- ☆98Updated last year