ReinForce-II / mmapeakLinks
☆34Updated 4 months ago
Alternatives and similar repositories for mmapeak
Users that are interested in mmapeak are comparing it to the libraries listed below
Sorting:
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 6 months ago
- Gpu benchmark☆65Updated 6 months ago
- ☆146Updated last month
- ☆73Updated 7 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 4 months ago
- Samples of good AI generated CUDA kernels☆88Updated 2 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆142Updated this week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 9 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆277Updated last year
- High-Performance SGEMM on CUDA devices☆98Updated 6 months ago
- Learning about CUDA by writing PTX code.☆133Updated last year
- Inference RWKV v7 in pure C.☆37Updated 2 weeks ago
- ☆17Updated 8 months ago
- ☆77Updated last month
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆504Updated this week
- ring-attention experiments☆147Updated 9 months ago
- Fast low-bit matmul kernels in Triton☆339Updated last week
- Inference of Mamba models in pure C☆190Updated last year
- ☆216Updated 6 months ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆45Updated 11 months ago
- LLM Inference on consumer devices☆123Updated 4 months ago
- [WIP] Better (FP8) attention for Hopper☆32Updated 5 months ago
- Custom PTX Instruction Benchmark☆126Updated 5 months ago
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆47Updated 5 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- Fast and memory-efficient exact attention☆180Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer(WIP) for Triton Kernels☆139Updated this week
- NVIDIA Linux open GPU with P2P support☆27Updated last week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆71Updated this week
- scalable and robust tree-based speculative decoding algorithm☆354Updated 6 months ago