thinking-machines-lab / batch_invariant_opsLinks
β423Updated this week
Alternatives and similar repositories for batch_invariant_ops
Users that are interested in batch_invariant_ops are comparing it to the libraries listed below
Sorting:
- π₯ A minimal training framework for scaling FLA modelsβ236Updated 3 weeks ago
- Efficient triton implementation of Native Sparse Attention.β215Updated 3 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ181Updated 2 months ago
- β84Updated 6 months ago
- β242Updated 3 months ago
- Physics of Language Models, Part 4β242Updated last month
- Triton-based implementation of Sparse Mixture of Experts.β238Updated 2 weeks ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.β140Updated this week
- ByteCheckpoint: An Unified Checkpointing Library for LFMsβ243Updated 2 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β187Updated last week
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ114Updated 2 months ago
- Fast and memory-efficient exact attentionβ69Updated 6 months ago
- β124Updated 3 months ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference enginesβ516Updated this week
- Async pipelined version of Verlβ116Updated 5 months ago
- Load compute kernels from the Hubβ271Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ129Updated 9 months ago
- Some preliminary explorations of Mamba's context scaling.β217Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β240Updated 3 months ago
- Normalized Transformer (nGPT)β188Updated 9 months ago
- β141Updated 6 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ216Updated last year
- The evaluation framework for training-free sparse attention in LLMsβ91Updated 2 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β265Updated last month
- Understand and test language model architectures on synthetic tasks.β224Updated last month
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ140Updated 3 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.β134Updated last year
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ402Updated 9 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top ofβ¦β145Updated last year
- ring-attention experimentsβ150Updated 10 months ago