flex-block-attn: an efficient block sparse attention computation library
☆131Dec 26, 2025Updated 5 months ago
Alternatives and similar repositories for flex-block-attn
Users that are interested in flex-block-attn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 9 months ago
- [ICLR 2026] This is the official PyTorch implementation of "BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Gen…☆43Oct 9, 2025Updated 7 months ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 4 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- ☆47Dec 13, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆11Jan 2, 2022Updated 4 years ago
- Depth-Bounded PCFG Induction☆13Apr 19, 2019Updated 7 years ago
- ☆37Aug 7, 2025Updated 9 months ago
- Code Release for "On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies"☆16Apr 13, 2021Updated 5 years ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Oct 11, 2024Updated last year
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆51Oct 21, 2023Updated 2 years ago
- trying to reproduce suno v3☆34Jan 29, 2025Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆150Feb 25, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention☆665Mar 6, 2026Updated 2 months ago
- Triton Implementation of Flash Attention with Bias.☆24Apr 16, 2025Updated last year
- Open Source Speech/Text Data on AI☆19Sep 13, 2022Updated 3 years ago
- Helpful tools and examples for working with flex-attention☆1,190Apr 13, 2026Updated last month
- ☆17Dec 12, 2023Updated 2 years ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 8 months ago
- ☆63Jun 12, 2025Updated 11 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆190Feb 11, 2026Updated 3 months ago
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [CVPR 2026] Official pytorch implementation of "ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding"☆29Dec 17, 2025Updated 5 months ago
- Official code for the paper "Attention as a Hypernetwork"☆57Feb 24, 2026Updated 3 months ago
- Asynchronous pipeline parallel optimization☆21Feb 2, 2026Updated 3 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆100Sep 19, 2025Updated 8 months ago
- [Arxiv 2025] In-Video Instructions: Visual Signals as Generative Control☆45Nov 25, 2025Updated 6 months ago
- Wan: Open and Advanced Large-Scale Video Generative Models☆29Jul 28, 2025Updated 9 months ago
- Learning to Model Editing Processes☆26Aug 3, 2025Updated 9 months ago
- ☆28Oct 2, 2025Updated 7 months ago
- A sparse attention kernel supporting mix sparse patterns☆516Jan 18, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Simple and efficient pytorch-native transformer training and inference (batched)☆78Apr 2, 2024Updated 2 years ago
- Distributed Compiler based on Triton for Parallel Systems☆1,440Apr 22, 2026Updated last month
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆355Updated this week
- [ICLR 2026] Adapting Self-Supervised Representations as a Latent Space for Efficient Generation☆54Apr 24, 2026Updated last month
- [ICML 2025] Official implementation of the paper "Compressed Image Generation with Denoising Diffusion Codebook Models"☆86Aug 10, 2025Updated 9 months ago
- TinyML and Efficient Deep Learning Computing☆20Apr 26, 2024Updated 2 years ago
- A list of language models with permissive licenses such as MIT or Apache 2.0☆24Feb 28, 2025Updated last year