thunlp / Seq1F1BLinks

Sequence-level 1F1B schedule for LLMs.

☆29

Alternatives and similar repositories for Seq1F1B

Users that are interested in Seq1F1B are comparing it to the libraries listed below

Sorting:

infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆154Updated last month
LoongServe / LoongServe
☆109Updated 8 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆114Updated this week
d-matrix-ai / keyformer-llm
☆54Updated last year
AlibabaPAI / FLASHNN
☆96Updated 10 months ago
kwai / Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…
☆61Updated last year
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆216Updated last year
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆92Updated 7 months ago
flashinfer-ai / cutlass-viz
☆60Updated 3 months ago
zhuohan123 / terapipe
☆75Updated 4 years ago
microsoft / SparTA
☆150Updated last year
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆106Updated 2 months ago
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆79Updated 8 months ago
madsys-dev / deepseekv2-profile
☆145Updated 4 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆128Updated 6 months ago
thu-pacman / SmartMoE-AE
ATC23 AE
☆46Updated 2 years ago
InternLM / Awesome-LLM-Training-System
☆42Updated 11 months ago
CalebDu / Awesome-Cute
☆89Updated 2 months ago
gty111 / gLLM
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
☆36Updated this week
microsoft / chunk-attention
☆78Updated 3 months ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆183Updated 6 months ago
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆50Updated 4 months ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆134Updated 2 weeks ago
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆70Updated 2 months ago
DD-DuDa / BitDecoding
A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆56Updated last week
EfficientMoE / MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆215Updated 3 weeks ago
Victarry / PP-Schedule-Visualization
Pipeline Parallelism Emulation and Visualization
☆54Updated last month
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 4 months ago
stepfun-ai / StepMesh
☆24Updated last week
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆260Updated 2 weeks ago