Fast and memory-efficient exact attention
☆29Dec 2, 2024Updated last year
Alternatives and similar repositories for flash-attention-3
Users that are interested in flash-attention-3 are comparing it to the libraries listed below
Sorting:
- Basic world models☆31Oct 30, 2025Updated 4 months ago
- SGLang Kernel Wheel Index☆17Updated this week
- CoCo: CoCo as CoT for Text-to-Image Preview and Rare Concept Generation☆46Mar 10, 2026Updated last week
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Balanced K-means in Pytorch with strong GPU acceleration☆12Apr 30, 2020Updated 5 years ago
- ☆53Aug 28, 2024Updated last year
- [ICLR'25] Official repository of paper: Ranking-aware adapter for text-driven image ordering with CLIP☆16Apr 17, 2025Updated 11 months ago
- ☆27May 3, 2024Updated last year
- Implement FlashAttention v2 with minimal code to learn.☆15Jun 12, 2024Updated last year
- Furthest Point Sampling support N dimension tensor(CUDA Version)☆12May 6, 2021Updated 4 years ago
- Physics Master is a model fine-tuned from llama3-8B-Instruct. It can answer your physics question!☆16Aug 24, 2024Updated last year
- Joint image and Depth inpainting, ldm3d☆16Apr 28, 2024Updated last year
- ☆15Apr 28, 2023Updated 2 years ago
- ☆17Apr 9, 2025Updated 11 months ago
- E-Graph library☆22Apr 4, 2024Updated last year
- Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization.☆56Updated this week
- RADIX-4 SRT division☆12Oct 31, 2019Updated 6 years ago
- In this project, we propose to study Vision Transformers trained using the Barlow Twins self-supervised method, and compare the results w…☆16Oct 3, 2023Updated 2 years ago
- KSimply: An AI Potential Analyzer that recommends open-source models based on user hardware. / Un analizzatore di potenziale AI che consi…☆15Updated this week
- ☆24Feb 5, 2026Updated last month
- A CUDA kernel for NHWC GroupNorm for PyTorch☆23Nov 15, 2024Updated last year
- This project implements the Titans architecture from the paper "Titans: Learning to Memorize at Test Time" for market data prediction.☆11Jan 19, 2025Updated last year
- This is the official implementation for "Deep Magnification-Flexible Upsampling over3D Point Clouds".☆13Dec 2, 2021Updated 4 years ago
- LLM inference in C/C++☆20Oct 22, 2025Updated 5 months ago
- ☆21Jun 26, 2023Updated 2 years ago
- [ACM MM 2025] LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs☆15Feb 10, 2026Updated last month
- A fast, lightweight, and extensible RWKV chat UI powered by Flutter. Offline-ready, multi-backend support, ideal for local RWKV inference…☆83Updated this week
- ☆16Oct 20, 2025Updated 5 months ago
- ☆15Jun 5, 2023Updated 2 years ago
- A PyTorch implementation of [VCT](https://github.com/google-research/google-research/tree/master/vct)☆10Nov 25, 2022Updated 3 years ago
- Geometry-aware Novel View Synthesis with Pre-trained 2D Prior☆39Jun 3, 2023Updated 2 years ago
- Basic floating-point components for RISC-V processors☆12Aug 13, 2017Updated 8 years ago
- A implement of run-length encoding for Pytorch tensor using CUDA☆14Apr 7, 2021Updated 4 years ago
- Exploring Motion Ambiguity and Alignment for High-Quality Video Frame Interpolation (CVPR2023)☆14Jul 21, 2023Updated 2 years ago
- Experimental RISC-V assembler code snippets☆10Oct 23, 2019Updated 6 years ago
- ☆48Feb 23, 2025Updated last year
- Model souping for LLMs☆72Nov 18, 2025Updated 4 months ago
- Geometric algebra attention mechanism for tensorflow, keras, pytorch, and jax☆22Jan 24, 2024Updated 2 years ago
- Neural Distributed Image Compression using Cross-Attention Feature Alignment (NDIC-CAM) [WACV 2023]☆12Jul 19, 2022Updated 3 years ago