ring-attention experiments
β167Oct 17, 2024Updated last year
Alternatives and similar repositories for ring-attention
Users that are interested in ring-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Ring attention implementation with flash attentionβ1,024Sep 10, 2025Updated 9 months ago
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ547May 16, 2025Updated last year
- Code for "What really matters in matrix-whitening optimizers?"β24Oct 31, 2025Updated 7 months ago
- Large Context Attentionβ773Oct 13, 2025Updated 7 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ672May 21, 2026Updated 3 weeks ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Sample Codes using NVSHMEM on Multi-GPUβ30Jan 22, 2023Updated 3 years ago
- β336Updated this week
- DeeperGEMM: crazy optimized versionβ86May 5, 2025Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.β152May 29, 2025Updated last year
- β57Feb 24, 2026Updated 3 months ago
- An experimental communicating attention kernel based on DeepEP.β34Jul 29, 2025Updated 10 months ago
- FlexAttention w/ FlashAttention3 Supportβ27Oct 5, 2024Updated last year
- Triton-based implementation of Sparse Mixture of Experts.β278Oct 3, 2025Updated 8 months ago
- β92Feb 29, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- β13Jan 7, 2025Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ223Aug 19, 2024Updated last year
- extensible collectives library in tritonβ98Mar 31, 2025Updated last year
- Collection of kernels written in Triton languageβ196Jan 27, 2026Updated 4 months ago
- Odysseus: Playground of LLM Sequence Parallelismβ80Jun 17, 2024Updated last year
- β22May 5, 2025Updated last year
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.β881Updated this week
- GPU programming related news and material linksβ2,162Mar 8, 2026Updated 3 months ago
- Fast low-bit matmul kernels in Tritonβ470May 15, 2026Updated 3 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- flash attention tutorial written in python, triton, cuda, cutlassβ521Jan 20, 2026Updated 4 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.β600May 13, 2026Updated 3 weeks ago
- Write a fast kernel and see how you compare against the best humans and AI on gpumode.comβ98May 8, 2026Updated last month
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.β1,319Aug 28, 2025Updated 9 months ago
- β178Feb 3, 2024Updated 2 years ago
- A Top-Down Profiler for GPU Applicationsβ22Feb 29, 2024Updated 2 years ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ134Dec 3, 2024Updated last year
- Custom triton kernels for training Karpathy's nanoGPT.β19Oct 21, 2024Updated last year
- π€FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3Γβπ vs SDPA, up to 430Tπ on H200.β305Updated this week
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A lightweight design for computation-communication overlap.β234Jan 20, 2026Updated 4 months ago
- Puzzles for learning Tritonβ2,471Apr 1, 2026Updated 2 months ago
- study of cutlassβ22Nov 10, 2024Updated last year
- My study note for mlsysβ14Nov 4, 2024Updated last year
- β267Jul 11, 2024Updated last year
- β93Jul 5, 2024Updated last year
- Ship correct and fast LLM kernels to PyTorchβ150Jan 14, 2026Updated 4 months ago