mag- / gpu_benchmarkLinks
Gpu benchmark
☆73Updated 10 months ago
Alternatives and similar repositories for gpu_benchmark
Users that are interested in gpu_benchmark are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆113Updated 10 months ago
- A collection of tricks and tools to speed up transformer models☆192Updated last month
- Experimental GPU language with meta-programming☆24Updated last year
- ☆65Updated 5 months ago
- ☆76Updated 11 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 8 months ago
- train with kittens!☆63Updated last year
- Experiment of using Tangent to autodiff triton☆81Updated last year
- ☆112Updated 3 weeks ago
- Samples of good AI generated CUDA kernels☆94Updated 6 months ago
- [WIP] Better (FP8) attention for Hopper☆32Updated 9 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆214Updated last week
- RWKV-7: Surpassing GPT☆101Updated last year
- Learning about CUDA by writing PTX code.☆150Updated last year
- ☆91Updated last year
- ☆66Updated 8 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated last year
- ring-attention experiments☆160Updated last year
- Make triton easier☆49Updated last year
- 👷 Build compute kernels☆193Updated this week
- DeMo: Decoupled Momentum Optimization☆197Updated last year
- ☆105Updated 4 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- Load compute kernels from the Hub☆347Updated this week
- ☆18Updated last year
- LLM training in simple, raw C/CUDA☆108Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆85Updated 3 months ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆75Updated last year
- research impl of Native Sparse Attention (2502.11089)☆63Updated 9 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 3 months ago