thunlp / Seq1F1B
Sequence-level 1F1B schedule for LLMs.
☆16Updated 3 weeks ago
Alternatives and similar repositories for Seq1F1B:
Users that are interested in Seq1F1B are comparing it to the libraries listed below
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆47Updated 5 months ago
- ☆82Updated 2 months ago
- ☆79Updated 4 months ago
- ☆51Updated 9 months ago
- nnScaler: Compiling DNN models for Parallel Training☆87Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.☆85Updated 2 weeks ago
- ☆70Updated 3 years ago
- ☆134Updated 6 months ago
- High performance Transformer implementation in C++.☆98Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆58Updated 2 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆234Updated last month
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆132Updated 6 months ago
- A collection of memory efficient attention operators implemented in the Triton language.☆230Updated 7 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆230Updated 2 months ago
- ☆94Updated last month
- Puzzles for learning Triton, play it with minimal environment configuration!☆205Updated last month
- A fast communication-overlapping library for tensor parallelism on GPUs.☆274Updated 2 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆273Updated last month
- A baseline repository of Auto-Parallelism in Training Neural Networks☆142Updated 2 years ago
- ☆38Updated 7 months ago
- ☆71Updated 5 months ago
- ☆93Updated this week
- ☆141Updated last week
- Sequence-level 1F1B schedule for LLMs.☆17Updated 7 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆76Updated 2 months ago
- ☆72Updated 2 years ago
- Examples of CUDA implementations by Cutlass CuTe☆128Updated last month
- ATC23 AE☆44Updated last year
- Curated collection of papers in MoE model inference☆38Updated this week
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆205Updated last month