☆979Nov 4, 2025Updated 4 months ago
Alternatives and similar repositories for batch_invariant_ops
Users that are interested in batch_invariant_ops are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 8 months ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆1,699Mar 18, 2026Updated last week
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆96Mar 5, 2026Updated 2 weeks ago
- DeeperGEMM: crazy optimized version☆75May 5, 2025Updated 10 months ago
- ☆39Dec 14, 2025Updated 3 months ago
- ☆52May 19, 2025Updated 10 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,394Mar 11, 2026Updated last week
- Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.☆18Jan 15, 2025Updated last year
- Kernels, of the mega variety :)☆693Updated this week
- Supporting code for the blog post on modular manifolds.☆120Sep 26, 2025Updated 5 months ago
- Tile primitives for speedy kernels☆3,244Mar 17, 2026Updated last week
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,692Updated this week
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆925Feb 28, 2026Updated 3 weeks ago
- FlashInfer: Kernel Library for LLM Serving☆5,194Updated this week
- Muon is Scalable for LLM Training☆1,446Aug 3, 2025Updated 7 months ago
- slime is an LLM post-training framework for RL Scaling.☆4,906Updated this week
- A Rust reimplementation of genai-bench for benchmarking LLM serving systems at high concurrency with accurate timing and industry-standar…☆279Mar 18, 2026Updated last week
- ☆234Nov 19, 2025Updated 4 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter☆156Feb 27, 2026Updated 3 weeks ago
- A Quirky Assortment of CuTe Kernels☆863Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆5,403Updated this week
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,232Aug 27, 2025Updated 6 months ago
- A bibliography and survey of the papers surrounding o1☆1,213Nov 16, 2024Updated last year
- Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.☆4,855Updated this week
- Triton-based implementation of Sparse Mixture of Experts.☆270Oct 3, 2025Updated 5 months ago
- Helpful tools and examples for working with flex-attention☆1,161Feb 8, 2026Updated last month
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆131Jun 24, 2025Updated 9 months ago
- Analyze computation-communication overlap in V3/R1.☆1,149Mar 21, 2025Updated last year
- Build compute kernels and load them from the Hub.☆518Updated this week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆809Updated this week
- Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.☆1,001Updated this week
- a simple API to use CUPTI☆10Aug 19, 2025Updated 7 months ago
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 4 months ago
- Ring attention implementation with flash attention☆996Sep 10, 2025Updated 6 months ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆20,097Updated this week
- Collections of RLxLM experiments using minimal codes☆14Feb 17, 2025Updated last year
- Scalable toolkit for efficient model reinforcement☆1,447Updated this week
- SGLang is a high-performance serving framework for large language models and multimodal models.☆24,829Updated this week