mag- / gpu_benchmarkLinks
Gpu benchmark
☆72Updated 9 months ago
Alternatives and similar repositories for gpu_benchmark
Users that are interested in gpu_benchmark are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆109Updated 9 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 11 months ago
- Samples of good AI generated CUDA kernels☆91Updated 5 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated last year
- ☆60Updated 4 months ago
- ring-attention experiments☆155Updated last year
- ☆89Updated last year
- [WIP] Better (FP8) attention for Hopper☆33Updated 8 months ago
- ☆105Updated last week
- Load compute kernels from the Hub☆316Updated last week
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆74Updated 9 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- research impl of Native Sparse Attention (2502.11089)☆62Updated 8 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆187Updated this week
- Experiment of using Tangent to autodiff triton☆79Updated last year
- RWKV-7: Surpassing GPT☆98Updated 11 months ago
- train with kittens!☆63Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 7 months ago
- Make triton easier☆48Updated last year
- 👷 Build compute kernels☆166Updated this week
- ☆18Updated 11 months ago
- Experimental GPU language with meta-programming☆23Updated last year
- ☆176Updated last year
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆50Updated 8 months ago
- Token Omission Via Attention☆127Updated last year
- ☆65Updated 7 months ago
- Simple high-throughput inference library☆149Updated 5 months ago
- ☆91Updated last year
- QuIP quantization☆59Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆223Updated last year