mag- / gpu_benchmark
Gpu benchmark
☆50Updated 3 months ago
Alternatives and similar repositories for gpu_benchmark:
Users that are interested in gpu_benchmark are comparing it to the libraries listed below
- ☆75Updated 6 months ago
- SGEMM that beats cuBLAS☆36Updated this week
- Focused on fast experimentation and simplicity☆64Updated 3 weeks ago
- supporting pytorch FSDP for optimizers☆75Updated last month
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆14Updated 2 weeks ago
- ☆27Updated 6 months ago
- Experiment of using Tangent to autodiff triton☆74Updated 11 months ago
- ☆49Updated 10 months ago
- Jax like function transformation engine but micro, microjax☆30Updated 2 months ago
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆47Updated last month
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 6 months ago
- Normalized Transformer (nGPT)☆145Updated last month
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆110Updated last month
- [WIP] Transformer to embed Danbooru labelsets☆13Updated 9 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- Collection of autoregressive model implementation☆76Updated last week
- ☆54Updated 3 weeks ago
- Train, tune, and infer Bamba model☆75Updated this week
- ☆53Updated last year
- Make triton easier☆42Updated 7 months ago
- RWKV-7: Surpassing GPT☆68Updated 2 months ago
- Implementation of DreamerV3 in Pytorch☆42Updated last month
- QuIP quantization☆48Updated 10 months ago
- ☆99Updated 3 weeks ago
- Testing LLM reasoning abilities with family relationship quizzes.☆55Updated this week
- ring-attention experiments☆116Updated 3 months ago
- working implimention of deepseek MLA☆23Updated last week
- ☆20Updated 2 months ago
- Inference of Mamba models in pure C☆183Updated 10 months ago
- ☆45Updated last year