My submission for the GPUMODE/AMD fp8 mm challenge
☆29Jun 4, 2025Updated 10 months ago
Alternatives and similar repositories for gpumode-amd-fp8-mm
Users that are interested in gpumode-amd-fp8-mm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆76Feb 11, 2026Updated 2 months ago
- ☆18Updated this week
- ☆18Jun 6, 2025Updated 10 months ago
- ☆32Jul 2, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆23Apr 7, 2026Updated last week
- Automating analysis from trace files☆66Apr 10, 2026Updated last week
- Quantized LLM training in pure CUDA/C++.☆244Mar 6, 2026Updated last month
- ☆17Aug 5, 2025Updated 8 months ago
- Python library to add support for embedding natural code in Python with shared program state.☆29Jan 20, 2026Updated 2 months ago
- A fault-tolerant RDMA-based disaggregated key-value store with 1-RTT UPDATEs and GETs thanks to the SWARM replication protocol☆14Sep 25, 2024Updated last year
- documentation used in my projects☆19Updated this week
- [ICLR 2022 Spotlight] Multi-Stage Episodic Control for Strategic Exploration in Text Games☆15Feb 8, 2026Updated 2 months ago
- Learn CUDA with PyTorch☆274Apr 9, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Experimental web-based simulator for exploring metastable behaviors in distributed systems☆59Updated this week
- A wrapper around libssh2 for .NET☆30Jan 21, 2026Updated 2 months ago
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated 11 months ago
- Personal Finance Expense Tracker☆19Nov 14, 2025Updated 5 months ago
- Hand-Rolled GPU communications library☆92Nov 25, 2025Updated 4 months ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆53Updated this week
- CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval☆24Jun 28, 2025Updated 9 months ago
- We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effe…☆25Feb 10, 2024Updated 2 years ago
- A powerful, interactive Python CLI for converting, manipulating, and inspecting media files using FFmpeg 🎬☆19Updated this week
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- implementations of some research papers 😴😴☆22Jul 28, 2025Updated 8 months ago
- Raptor is a modern, fast, and easy-to-use system for building disk images, bootable isos, containers and much more, from a simple, Docker…☆39Feb 10, 2026Updated 2 months ago
- Compare AI model pricing and performance in a simple interactive web app.☆18Apr 10, 2026Updated last week
- key/value store for Python based on Cloudflare workers☆33Jun 13, 2025Updated 10 months ago
- ☆13Updated this week
- Simple application for tracking and managing a home schooling program.☆40Sep 13, 2025Updated 7 months ago
- Scans for used translations, compares with your translations file and removes the ones that are not in use.☆17Nov 21, 2025Updated 4 months ago
- Camera app drawn on SkiaSharp canvas with real-time SKSL shaders. Built-in desktop shader editor. Made with DrawnUI for .NET MAUI.☆24Updated this week
- Tutorial for TikZ☆11Apr 3, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A modular, privacy-minded translation extension for browsers☆28Apr 8, 2026Updated last week
- Docker/podman container for llama.cpp/vllm/exllamav{2,3} orchestrated using llama-swap☆18Apr 10, 2026Updated last week
- ☆17Jul 1, 2025Updated 9 months ago
- Write a fast kernel and see how you compare against the best humans and AI on gpumode.com☆91Updated this week
- Внедрение в инструменты BPM (Business Process Management software tools моделирования верхнеуровневых и детальных процессов) и EA (от биз…☆17Feb 28, 2026Updated last month
- A collection of type-safe, async friendly, and un-opinionated enhancements to SQLAlchemy Core that works well with mordern web servers☆35Dec 12, 2025Updated 4 months ago
- A BMad Method Compliant stand along module that has agents and workflows to help bring out the creativity of the user through various exe…☆62Updated this week