Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X
☆76Feb 11, 2026Updated 2 months ago
Alternatives and similar repositories for RadeonFlow_Kernels
Users that are interested in RadeonFlow_Kernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18Updated this week
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- ☆32Jul 2, 2025Updated 9 months ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆53Updated this week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆18Jun 6, 2025Updated 10 months ago
- PyTorch distributed training acceleration framework☆54Aug 13, 2025Updated 8 months ago
- Optimizing diffusion for production-ready speeds☆39Jan 10, 2026Updated 3 months ago
- Automating analysis from trace files☆66Apr 10, 2026Updated last week
- LLM training in simple, raw C/HIP for AMD GPUs☆62Sep 23, 2024Updated last year
- ☆11Jun 9, 2023Updated 2 years ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆34Nov 29, 2024Updated last year
- ☆18Nov 11, 2025Updated 5 months ago
- Python library to add support for embedding natural code in Python with shared program state.☆29Jan 20, 2026Updated 2 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Pytorch routines for (Ker)nel (Mac)hines☆12Oct 10, 2025Updated 6 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆150May 10, 2025Updated 11 months ago
- ☆57Feb 24, 2026Updated last month
- Ahead of Time (AOT) Triton Math Library☆96Apr 8, 2026Updated last week
- ☆119May 19, 2025Updated 10 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆129Apr 10, 2026Updated last week
- ☆30Updated this week
- ☆12Sep 1, 2023Updated 2 years ago
- simple grpo☆12May 28, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)☆13Feb 7, 2026Updated 2 months ago
- A bunch of kernels that might make stuff slower 😉☆87Updated this week
- Automated Design of Agentic Systems☆10Sep 7, 2024Updated last year
- Code repository for "Portal-Based Path Perturbation for Metropolis Light Transport"☆10Oct 26, 2020Updated 5 years ago
- AI Tensor Engine for ROCm☆406Updated this week
- Implementation of ADMM-based sparse CNN architecture.☆12Aug 30, 2017Updated 8 years ago
- ☆261Jul 11, 2024Updated last year
- Does all kind of cool stuff to make analyzing meta classes easier. Now featuring WRedLogger.py, the previous backend of NetDbg☆10Jun 7, 2023Updated 2 years ago
- MAD (Model Automation and Dashboarding)☆33Updated this week
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Extra notebooks for ECE-GY 6143☆27Jan 20, 2026Updated 2 months ago
- Automated High-Performance GPU Kernel Generation☆95Apr 11, 2026Updated last week
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Feb 2, 2025Updated last year
- The Taichi MPI demos with MPI4Py☆13Nov 3, 2022Updated 3 years ago
- A simple and minimal open source implementation of "Introducing LFM2: The Fastest On-Device Foundation Models on the Market" from Liquid …☆23Updated this week
- ☆11Aug 13, 2024Updated last year
- ☆16Oct 20, 2024Updated last year