mag- / gpu_benchmarkLinks
Gpu benchmark
☆68Updated 7 months ago
Alternatives and similar repositories for gpu_benchmark
Users that are interested in gpu_benchmark are comparing it to the libraries listed below
Sorting:
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 7 months ago
- High-Performance SGEMM on CUDA devices☆101Updated 8 months ago
- Samples of good AI generated CUDA kernels☆90Updated 3 months ago
- RWKV-7: Surpassing GPT☆95Updated 10 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 9 months ago
- ☆64Updated 6 months ago
- ring-attention experiments☆152Updated 11 months ago
- ☆94Updated 3 weeks ago
- 👷 Build compute kernels☆143Updated this week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 5 months ago
- [WIP] Better (FP8) attention for Hopper☆32Updated 6 months ago
- ☆17Updated 9 months ago
- H-Net Dynamic Hierarchical Architecture☆79Updated last week
- Inference of Mamba models in pure C☆191Updated last year
- ☆56Updated 3 months ago
- ☆76Updated 8 months ago
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆48Updated 6 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 11 months ago
- LLM training in simple, raw C/CUDA☆104Updated last year
- Load compute kernels from the Hub☆283Updated this week
- Experimental GPU language with meta-programming☆23Updated last year
- research impl of Native Sparse Attention (2502.11089)☆61Updated 7 months ago
- Make triton easier☆47Updated last year
- QuIP quantization☆59Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆161Updated this week
- train with kittens!☆62Updated 10 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- ☆89Updated last year
- How to ensure correctness and ship LLM generated kernels in PyTorch☆58Updated this week
- FlexAttention w/ FlashAttention3 Support☆27Updated 11 months ago