MayDomine / Seq1F1BLinks

Sequence-level 1F1B schedule for LLMs.

☆18

Alternatives and similar repositories for Seq1F1B

Users that are interested in Seq1F1B are comparing it to the libraries listed below

Sorting:

feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆216Updated last year
CalvinXKY / mfu_calculation
A simple calculation for LLM MFU.
☆48Updated last month
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆120Updated 6 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆130Updated 10 months ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆125Updated 4 months ago
fzyzcjy / torch_utils
Utility scripts for PyTorch (e.g. Memory profiler that understands more low-level allocations such as NCCL)
☆62Updated last month
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆105Updated 7 months ago
microsoft / chunk-attention
☆78Updated 6 months ago
madsys-dev / deepseekv2-profile
☆148Updated 7 months ago
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆110Updated 7 months ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆156Updated 2 weeks ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆169Updated last year
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆84Updated last year
yanring / Megatron-MoE-ModelZoo
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆114Updated last week
AniZpZ / AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
☆107Updated 6 months ago
flashinfer-ai / cutlass-viz
☆65Updated 6 months ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
thunlp / Seq1F1B
Sequence-level 1F1B schedule for LLMs.
☆32Updated 2 months ago
InternLM / Awesome-LLM-Training-System
☆43Updated last year
Equationliu / Kangaroo
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆60Updated last year
ISEEKYAN / mbridge
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆142Updated this week
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆320Updated last year
feifeibear / DPSKV3MFU
Estimate MFU for DeepSeekV3
☆26Updated 9 months ago
d-matrix-ai / keyformer-llm
☆58Updated last year
MayDomine / Burst-Attention
Distributed IO-aware Attention algorithm
☆21Updated last month
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆92Updated 2 years ago
ByteDance-Seed / cudaLLM
☆120Updated 2 months ago
stanford-futuredata / stk
☆112Updated last year
InternLM / turbomind
☆97Updated 7 months ago