feifeibear/Odysseus-Transformer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/feifeibear/Odysseus-Transformer)

feifeibear / Odysseus-Transformer

Odysseus: Playground of LLM Sequence Parallelism

☆83

Alternatives and similar repositories for Odysseus-Transformer

Users that are interested in Odysseus-Transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

feifeibear / long-context-attention
View on GitHub
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆682May 21, 2026Updated 2 months ago
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆45Nov 22, 2024Updated last year
zhuzilin / ring-flash-attention
View on GitHub
Ring attention implementation with flash attention
☆1,037Sep 10, 2025Updated 10 months ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
Infini-AI-Lab / MagicDec
View on GitHub
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆155Dec 4, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
awslabs / Lancet-Accelerating-MoE-Training-via-Whole-Graph-Computation-Communication-Overlapping
View on GitHub
Official implementation for the paper Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapp…
☆14May 20, 2026Updated 2 months ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
yikangshen / megablocks
View on GitHub
☆20May 30, 2024Updated 2 years ago
fanshiqing / grouped_gemm
View on GitHub
PyTorch bindings for CUTLASS grouped GEMM.
☆192Apr 8, 2026Updated 3 months ago
jzhang38 / EasyContext
View on GitHub
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆760Sep 27, 2024Updated last year
tlc-pack / libflash_attn
View on GitHub
Standalone Flash Attention v2 kernel without libtorch dependency
☆113Sep 10, 2024Updated last year
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆281Oct 3, 2025Updated 9 months ago
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
PipeFusion / PipeFusion
View on GitHub
A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
☆58May 3, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
google / iopddl
View on GitHub
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆25May 12, 2025Updated last year
xdit-project / DiTCacheAnalysis
View on GitHub
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆34Nov 29, 2024Updated last year
IBM / triton-dejavu
View on GitHub
Framework to reduce autotune overhead to zero for well known deployments.
☆102Sep 19, 2025Updated 10 months ago
weishengying / cutlass_flash_atten_fp8
View on GitHub
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆82Aug 12, 2024Updated last year
yifuwang / symm-mem-recipes
View on GitHub
☆170Dec 27, 2024Updated last year
BBuf / flash-rwkv
View on GitHub
☆32May 26, 2024Updated 2 years ago
chengzeyi / piflux
View on GitHub
(WIP) Parallel inference for black-forest-labs' FLUX model.
☆19Nov 18, 2024Updated last year
xdit-project / xDiT
View on GitHub
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
☆2,662Jul 14, 2026Updated last week
TZW1998 / ParaTAA-Diffusion
View on GitHub
This is the official repo for the paper "Accelerating Parallel Sampling of Diffusion Models" Tang et al. ICML 2024 https://openreview.net…
☆16Jul 19, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆242Jan 20, 2026Updated 6 months ago
feifeibear / SeeReel
View on GitHub
Agent-native Seedance 2.0 short-film studio: cli for AI, canvas for human
☆15Jun 14, 2026Updated last month
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆539Feb 10, 2025Updated last year
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
microsoft / BitBLAS
View on GitHub
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆769Aug 6, 2025Updated 11 months ago
JF-D / Parcae
View on GitHub
☆22Apr 22, 2024Updated 2 years ago
dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆477Jul 15, 2026Updated last week
sail-sg / zero-bubble-pipeline-parallelism
View on GitHub
Zero Bubble Pipeline Parallelism
☆464May 7, 2025Updated last year
OpenNLPLab / lightning-attention
View on GitHub
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
☆344Feb 23, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,345Aug 28, 2025Updated 10 months ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
View on GitHub
☆22Dec 15, 2023Updated 2 years ago
yester31 / Cutlass_EX
View on GitHub
study of cutlass
☆22Nov 10, 2024Updated last year
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
microsoft / vattention
View on GitHub
Dynamic Memory Management for Serving LLMs without PagedAttention
☆506Jul 17, 2026Updated last week
gpu-mode / ring-attention
View on GitHub
ring-attention experiments
☆171Oct 17, 2024Updated last year
KuangjuX / cuda-evolve-oss
View on GitHub
Autonomous GPU kernel optimization system driven by AI agents.
☆31Mar 29, 2026Updated 3 months ago