Fast and memory-efficient exact attention
☆31Dec 2, 2024Updated last year
Alternatives and similar repositories for flash-attention-3
Users that are interested in flash-attention-3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation☆51Apr 9, 2026Updated last month
- SGLang Kernel Wheel Index☆22May 15, 2026Updated last week
- ☆24Jun 18, 2024Updated last year
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆19Oct 21, 2024Updated last year
- Benchmark structured generation libraries☆31Oct 25, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Balanced K-means in Pytorch with strong GPU acceleration☆12Apr 30, 2020Updated 6 years ago
- ☆53Aug 28, 2024Updated last year
- [ICLR'25] Official repository of paper: Ranking-aware adapter for text-driven image ordering with CLIP☆16Apr 17, 2025Updated last year
- ☆27May 3, 2024Updated 2 years ago
- ☆13May 13, 2026Updated last week
- Implement FlashAttention v2 with minimal code to learn.☆16Jun 12, 2024Updated last year
- ☆13Jan 15, 2023Updated 3 years ago
- A rust version of the Caffe library.☆19Jun 16, 2021Updated 4 years ago
- Joint image and Depth inpainting, ldm3d☆16Apr 28, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆17Apr 9, 2025Updated last year
- Teaching materials for improving research software writing abilities.☆14Apr 16, 2026Updated last month
- CUDA SGEMM optimization note☆15Oct 31, 2023Updated 2 years ago
- Hardware Division Units☆10Jul 17, 2014Updated 11 years ago
- Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization.☆63Mar 16, 2026Updated 2 months ago
- RADIX-4 SRT division☆12Oct 31, 2019Updated 6 years ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆23Nov 15, 2024Updated last year
- A graph coloring register allocator for LLVM.☆11Jan 23, 2017Updated 9 years ago
- Implementation of a holodeck, written in Pytorch☆19Nov 1, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆21Jun 26, 2023Updated 2 years ago
- ☆11Jun 4, 2024Updated last year
- Triton implementation of Flash Attention2.0☆54Jul 31, 2023Updated 2 years ago
- ☆15Jun 5, 2023Updated 2 years ago
- ☆16Oct 20, 2025Updated 7 months ago
- Geometry-aware Novel View Synthesis with Pre-trained 2D Prior☆39Jun 3, 2023Updated 2 years ago
- (Verilog) A simple convolution layer implementation with systolic array structure☆13May 9, 2022Updated 4 years ago
- Basic floating-point components for RISC-V processors☆12Aug 13, 2017Updated 8 years ago
- Associative scan package for DRYing some code between repos☆18Jan 5, 2026Updated 4 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- unsigned Radix-2 SRT division,基2除法☆16May 12, 2015Updated 11 years ago
- Unofficial Implement PU-Transformer☆19Jul 15, 2022Updated 3 years ago
- Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation (CVPR2023)☆14Jul 21, 2023Updated 2 years ago
- Experimental RISC-V assembler code snippets☆10Oct 23, 2019Updated 6 years ago
- [IEEE PCS 2022 best paper finalist] "FloLPIPS: A Bespoke Video Quality Metric for Frame Interpoation", Duolikun Danier, Fan Zhang, David …☆22Mar 9, 2024Updated 2 years ago
- ☆49Feb 23, 2025Updated last year
- [TVCG 2023] PCDNF: Revisiting Learning-based Point Cloud Denoising via Joint Normal Filtering☆18Aug 4, 2023Updated 2 years ago