ReinForce-II / mmapeakLinks
☆51Updated last month
Alternatives and similar repositories for mmapeak
Users that are interested in mmapeak are comparing it to the libraries listed below
Sorting:
- NVIDIA Linux open GPU with P2P support☆126Updated last month
- ☆163Updated 7 months ago
- Fast and memory-efficient exact attention☆213Updated this week
- Fast low-bit matmul kernels in Triton☆424Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆237Updated this week
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆51Updated last year
- Gpu benchmark☆74Updated last year
- AI Tensor Engine for ROCm☆348Updated this week
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆74Updated 11 months ago
- ☆117Updated 3 weeks ago
- ☆71Updated 7 months ago
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- ☆277Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆352Updated last year
- ☆79Updated last year
- ☆91Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆276Updated 6 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- Samples of good AI generated CUDA kernels☆99Updated 8 months ago
- AMD related optimizations for transformer models☆97Updated 3 months ago
- Ahead of Time (AOT) Triton Math Library☆88Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆189Updated this week
- kernels, of the mega variety☆657Updated 4 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆280Updated 2 years ago
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆600Updated 2 months ago
- ☆18Updated last year
- ☆92Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 10 months ago
- Fast and Furious AMD Kernels☆346Updated last week
- ☆206Updated 8 months ago