RulinShao/LightSeq

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RulinShao/LightSeq)

RulinShao / LightSeq

Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training

☆223

Alternatives and similar repositories for LightSeq

Users that are interested in LightSeq are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RulinShao / FastCkpt
View on GitHub
Python package for rematerialization-aware gradient checkpointing
☆27Oct 31, 2023Updated 2 years ago
zhuzilin / ring-flash-attention
View on GitHub
Ring attention implementation with flash attention
☆1,037Sep 10, 2025Updated 10 months ago
feifeibear / long-context-attention
View on GitHub
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆682May 21, 2026Updated 2 months ago
haoliuhl / ringattention
View on GitHub
Large Context Attention
☆773Oct 13, 2025Updated 9 months ago
sail-sg / zero-bubble-pipeline-parallelism
View on GitHub
Zero Bubble Pipeline Parallelism
☆464May 7, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
exists-forall / striped_attention
View on GitHub
☆49Nov 10, 2023Updated 2 years ago
glassroom / heinsen_attention
View on GitHub
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆25Jun 6, 2024Updated 2 years ago
fanshiqing / grouped_gemm
View on GitHub
PyTorch bindings for CUTLASS grouped GEMM.
☆192Apr 8, 2026Updated 3 months ago
jzhang38 / EasyContext
View on GitHub
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆760Sep 27, 2024Updated last year
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
lm-sys / llm-decontaminator
View on GitHub
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆325Dec 20, 2023Updated 2 years ago
OpenNLPLab / LASP
View on GitHub
Linear Attention Sequence Parallelism (LASP)
☆87Jun 4, 2024Updated 2 years ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
lucidrains / ring-attention-pytorch
View on GitHub
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆546May 16, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
DachengLi1 / LongChat
View on GitHub
Official repository for LongChat and LongEval
☆536May 24, 2024Updated 2 years ago
HKUNLP / ChunkLlama
View on GitHub
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
☆450Oct 16, 2024Updated last year
microsoft / ParrotServe
View on GitHub
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆223Sep 21, 2024Updated last year
Ryu1845 / hyena-jax
View on GitHub
Implementation of Hyena Hierarchy in JAX
☆10Apr 30, 2023Updated 3 years ago
tgale96 / grouped_gemm
View on GitHub
PyTorch bindings for CUTLASS grouped GEMM.
☆154May 29, 2025Updated last year
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆281Oct 3, 2025Updated 9 months ago
parasailteam / coconet
View on GitHub
☆85Dec 2, 2022Updated 3 years ago
punica-ai / punica
View on GitHub
Serving multiple LoRA finetuned LLM as one
☆1,168May 8, 2024Updated 2 years ago
flexflow / flexflow-train
View on GitHub
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,898Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,758Jun 25, 2024Updated 2 years ago
tlc-pack / libflash_attn
View on GitHub
Standalone Flash Attention v2 kernel without libtorch dependency
☆113Sep 10, 2024Updated last year
jquesnelle / yarn
View on GitHub
YaRN: Efficient Context Window Extension of Large Language Models
☆1,740Apr 17, 2024Updated 2 years ago
Azure / MS-AMP
View on GitHub
Microsoft Automatic Mixed Precision Library
☆636Dec 1, 2025Updated 7 months ago
MayDomine / Burst-Attention
View on GitHub
Distributed IO-aware Attention algorithm
☆24Sep 24, 2025Updated 10 months ago
FMInference / H2O
View on GitHub
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
☆530Aug 1, 2024Updated last year
mit-han-lab / x-attention
View on GitHub
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆280Jul 6, 2025Updated last year
zhuohan123 / terapipe
View on GitHub
☆79May 4, 2021Updated 5 years ago
hao-ai-lab / LookaheadDecoding
View on GitHub
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,340Mar 6, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,345Aug 28, 2025Updated 10 months ago
alpa-projects / alpa
View on GitHub
Training and serving large-scale neural networks with auto parallelization.
☆3,180Dec 9, 2023Updated 2 years ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,448Updated this week
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆732Jul 4, 2026Updated 3 weeks ago
alibaba / Megatron-LLaMA
View on GitHub
Best practice for training LLaMA models in Megatron-LM
☆666Jan 2, 2024Updated 2 years ago
alan-hpc / cuda_op_benchmark
View on GitHub
方便扩展的Cuda算子理解和优化框架，仅用在学习使用
☆18Jun 13, 2024Updated 2 years ago
volcengine / veScale
View on GitHub
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
☆1,033Mar 3, 2026Updated 4 months ago