a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA environments.
☆51Aug 25, 2024Updated last year
Alternatives and similar repositories for flash-attention-v2-RDNA3-minimal
Users that are interested in flash-attention-v2-RDNA3-minimal are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple monkeypatch to boost AMD Navi 3 GPUs☆48Apr 21, 2025Updated 11 months ago
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Jun 24, 2024Updated last year
- Fast and memory-efficient exact attention ported to rocm☆13Dec 1, 2023Updated 2 years ago
- A tiny implementation of in-place FFT. The performance is comparable to FFTW3 for length 2^17 to 2^20.☆15Jul 24, 2018Updated 7 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- LLM training in simple, raw C/HIP for AMD GPUs☆61Sep 23, 2024Updated last year
- Flash Attention in raw Cuda C beating PyTorch☆38May 14, 2024Updated last year
- Optimized FP16/BF16 x FP4 GPU kernels for AMD GPUs☆45Feb 21, 2026Updated last month
- Implement FlashAttention v2 with minimal code to learn.☆15Jun 12, 2024Updated last year
- Running ComfyUI with AMD + ZLUDA (Windows)☆37Nov 2, 2024Updated last year
- Image processing tool for ComfyUI☆13Aug 6, 2025Updated 7 months ago
- Installation script for an AI applications using ROCm on Linux.☆40Mar 9, 2026Updated 2 weeks ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- ComfyUI custom nodes for RVC related inference and image generation☆37Oct 15, 2025Updated 5 months ago
- Everything you need to setup on your AMD system for Machine Learning Stuff☆19Jul 31, 2025Updated 7 months ago
- Fast and memory-efficient exact attention☆224Updated this week
- ☆159Sep 15, 2023Updated 2 years ago
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆112Sep 10, 2024Updated last year
- RWKV, in easy to read code☆73Mar 25, 2025Updated 11 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Updated this week
- Guides to hopefully simplify the process of using ROCm.☆12Sep 26, 2024Updated last year
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆869Updated this week
- A convenient fast Text to Speech Whisper Speech by Collabora you can train a voice on the fly on ComfyUI☆43Mar 9, 2025Updated last year
- Quick and easy Diffusers CLI☆15Mar 16, 2026Updated last week
- AutoHotKey script to translate Joystick movement to keypresses.☆12Jun 9, 2014Updated 11 years ago
- ☆15Feb 23, 2025Updated last year
- 8-bit CUDA functions for PyTorch☆70Sep 24, 2025Updated 5 months ago
- Development repository for the Triton language and compiler☆143Updated this week
- YOLOX with NCNN/MNN/TNN/ONNXRuntime C++.☆13Dec 18, 2021Updated 4 years ago
- A low-cost, high-performance deep learning training framework that enables efficient 100B-scale model fine-tuning on a commodity server w…☆24Mar 21, 2025Updated last year
- ☆87Jan 23, 2025Updated last year
- hipDF - GPU DataFrame Library☆16Mar 16, 2026Updated last week
- ☆67Oct 25, 2025Updated 4 months ago
- ☆12Feb 7, 2018Updated 8 years ago
- This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.☆42Sep 29, 2025Updated 5 months ago
- 🤖 Telegram chatbot frontend for Searx.☆15Nov 25, 2018Updated 7 years ago
- Repo for Source files of Avent miroZed Carrier Boards☆12Jan 9, 2025Updated last year
- A beautiful telnet/ssh client optimized for Mandarin BBS☆21Sep 8, 2009Updated 16 years ago