a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA environments.
☆52Aug 25, 2024Updated last year
Alternatives and similar repositories for flash-attention-v2-RDNA3-minimal
Users that are interested in flash-attention-v2-RDNA3-minimal are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple monkeypatch to boost AMD Navi 3 GPUs☆49Apr 21, 2025Updated last year
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Jun 24, 2024Updated last year
- Official repository Flash Local Linear Attention☆23Apr 23, 2026Updated last week
- ComfyUI custom nodes for DeepSeek, Qwen, GPT, and other OpenAI-compatible LLM APIs, with tools for chat, translation, vision, and JSON wo…☆21Apr 23, 2026Updated last week
- Flash Attention in raw Cuda C beating PyTorch☆38May 14, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Implement FlashAttention v2 with minimal code to learn.☆16Jun 12, 2024Updated last year
- Image processing tool for ComfyUI☆13Aug 6, 2025Updated 8 months ago
- Installation script for an AI applications using ROCm on Linux.☆45Updated this week
- 8-bit CUDA functions for PyTorch Rocm compatible☆42Mar 26, 2024Updated 2 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- ComfyUI custom nodes for RVC related inference and image generation☆38Oct 15, 2025Updated 6 months ago
- A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.☆24Jul 18, 2025Updated 9 months ago
- AI Tensor Engine for ROCm☆420Updated this week
- ☆162Sep 15, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆29Dec 17, 2024Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆113Sep 10, 2024Updated last year
- Guides to hopefully simplify the process of using ROCm.☆12Sep 26, 2024Updated last year
- A convenient fast Text to Speech Whisper Speech by Collabora you can train a voice on the fly on ComfyUI☆43Mar 9, 2025Updated last year
- ☆24Jul 16, 2025Updated 9 months ago
- Quick and easy Diffusers CLI☆15Apr 28, 2026Updated last week
- ☆15Feb 23, 2025Updated last year
- 8-bit CUDA functions for PyTorch☆72Sep 24, 2025Updated 7 months ago
- ☆17Apr 30, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆979Updated this week
- ☆24May 22, 2024Updated last year
- ☆87Jan 23, 2025Updated last year
- hipDF - GPU DataFrame Library☆16Mar 16, 2026Updated last month
- ☆66Oct 25, 2025Updated 6 months ago
- Open Containers distribution spec module for Django (under development)☆17Jan 2, 2023Updated 3 years ago
- This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.☆43Sep 29, 2025Updated 7 months ago
- ☆15Oct 9, 2022Updated 3 years ago
- ☆67Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Automated Design of Agentic Systems☆10Sep 7, 2024Updated last year
- FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA.☆276Updated this week
- Expert Specialization MoE Solution based on CUTLASS☆26Apr 14, 2026Updated 2 weeks ago
- ☆49Mar 3, 2024Updated 2 years ago
- Updated this repository to work with at least 4.26 -> https://github.com/afuzzyllama/VoronoiDiagramUE4☆11Jun 5, 2021Updated 4 years ago
- ☆67Feb 23, 2026Updated 2 months ago
- Does all kind of cool stuff to make analyzing meta classes easier. Now featuring WRedLogger.py, the previous backend of NetDbg☆10Jun 7, 2023Updated 2 years ago