mag- / gpu_benchmarkLinks
Gpu benchmark
☆69Updated 8 months ago
Alternatives and similar repositories for gpu_benchmark
Users that are interested in gpu_benchmark are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆107Updated 8 months ago
- 👷 Build compute kernels☆158Updated this week
- train with kittens!☆63Updated 11 months ago
- A collection of tricks and tools to speed up transformer models☆182Updated last week
- ☆60Updated 3 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 10 months ago
- ☆64Updated 6 months ago
- RWKV-7: Surpassing GPT☆97Updated 10 months ago
- Token Omission Via Attention☆127Updated last year
- Samples of good AI generated CUDA kernels☆91Updated 4 months ago
- H-Net Dynamic Hierarchical Architecture☆80Updated last month
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 8 months ago
- ☆102Updated this week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 6 months ago
- ring-attention experiments☆153Updated 11 months ago
- QuIP quantization☆59Updated last year
- Load compute kernels from the Hub☆299Updated this week
- [WIP] Better (FP8) attention for Hopper☆33Updated 7 months ago
- ☆89Updated last year
- Work in progress.☆74Updated 3 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆167Updated last week
- Experiment of using Tangent to autodiff triton☆80Updated last year
- Experimental GPU language with meta-programming☆23Updated last year
- Inference of Mamba models in pure C☆191Updated last year
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆50Updated 7 months ago
- ☆21Updated 7 months ago
- Official implementation for Training LLMs with MXFP4☆96Updated 5 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated last year
- Make triton easier☆48Updated last year
- ☆57Updated last year