ReinForce-II / mmapeakLinks
☆47Updated last week
Alternatives and similar repositories for mmapeak
Users that are interested in mmapeak are comparing it to the libraries listed below
Sorting:
- NVIDIA Linux open GPU with P2P support☆95Updated last week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆214Updated last week
- ☆159Updated 5 months ago
- Gpu benchmark☆73Updated 10 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 10 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆50Updated last year
- Fast low-bit matmul kernels in Triton☆407Updated 3 weeks ago
- ☆76Updated 11 months ago
- ☆112Updated 3 weeks ago
- Fast and memory-efficient exact attention☆203Updated last week
- High-Performance SGEMM on CUDA devices☆113Updated 10 months ago
- DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference☆572Updated 3 weeks ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆145Updated last week
- Code for data-aware compression of DeepSeek models☆66Updated last month
- LLM Inference on consumer devices☆128Updated 9 months ago
- AI Tensor Engine for ROCm☆322Updated this week
- Ahead of Time (AOT) Triton Math Library☆84Updated this week
- GPTQ inference Triton kernel☆317Updated 2 years ago
- ☆65Updated 5 months ago
- Samples of good AI generated CUDA kernels☆94Updated 6 months ago
- Development repository for the Triton language and compiler☆137Updated last week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆279Updated 2 years ago
- scalable and robust tree-based speculative decoding algorithm☆363Updated 10 months ago
- ☆205Updated 7 months ago
- ☆81Updated last week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆178Updated this week
- ring-attention experiments☆160Updated last year
- KV cache compression for high-throughput LLM inference☆148Updated 10 months ago
- Explore training for quantized models☆25Updated 5 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆272Updated 5 months ago