flex-block-attn: an efficient block sparse attention computation library
☆127Dec 26, 2025Updated 3 months ago
Alternatives and similar repositories for flex-block-attn
Users that are interested in flex-block-attn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Vortex: A Flexible and Efficient Sparse Attention Framework☆49Jan 21, 2026Updated 2 months ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training☆676Mar 16, 2026Updated last week
- This is the official PyTorch implementation of "BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation."☆40Oct 9, 2025Updated 5 months ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 2 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- ☆48Dec 13, 2025Updated 3 months ago
- ☆33Dec 10, 2025Updated 3 months ago
- Depth-Bounded PCFG Induction☆13Apr 19, 2019Updated 6 years ago
- ☆38Aug 7, 2025Updated 7 months ago
- Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"☆25Mar 2, 2025Updated last year
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"☆139Oct 17, 2025Updated 5 months ago
- Code Release for "On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies"☆16Apr 13, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Oct 11, 2024Updated last year
- ☆12Mar 4, 2022Updated 4 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- 🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …☆103Sep 8, 2025Updated 6 months ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆51Oct 21, 2023Updated 2 years ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- ☆25Jun 19, 2025Updated 9 months ago
- [CVPR 2026] Official pytorch implementation of "ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding"☆22Dec 17, 2025Updated 3 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆144Feb 25, 2026Updated last month
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention☆646Mar 6, 2026Updated 2 weeks ago
- Triton Implementation of Flash Attention with Bias.☆22Apr 16, 2025Updated 11 months ago
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆30Mar 28, 2025Updated 11 months ago
- Open Source Speech/Text Data on AI☆19Sep 13, 2022Updated 3 years ago
- Helpful tools and examples for working with flex-attention☆1,161Feb 8, 2026Updated last month
- The official implementation of Distribution Backtracking Distillation for One-step Diffusion Models☆32Jan 25, 2025Updated last year
- ☆16Dec 12, 2023Updated 2 years ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 6 months ago
- ☆63Jun 12, 2025Updated 9 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official code for the paper "Attention as a Hypernetwork"☆55Feb 24, 2026Updated last month
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆172Feb 11, 2026Updated last month
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- [AAAI 2024] DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning☆15Apr 29, 2024Updated last year
- Wan: Open and Advanced Large-Scale Video Generative Models☆28Jul 28, 2025Updated 7 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆97Sep 19, 2025Updated 6 months ago
- [Arxiv 2025] In-Video Instructions: Visual Signals as Generative Control☆45Nov 25, 2025Updated 4 months ago