United-Compute / gpu-benchmarkLinks
Benchmark your GPU with ease
☆28Updated last month
Alternatives and similar repositories for gpu-benchmark
Users that are interested in gpu-benchmark are comparing it to the libraries listed below
Sorting:
- Lego for GRPO☆30Updated 8 months ago
- Gpu benchmark☆74Updated last year
- ☆71Updated 7 months ago
- High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, …☆146Updated last week
- High-throughput tensor loading for PyTorch☆221Updated 2 weeks ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆74Updated last year
- ☆68Updated last year
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Updated last month
- look how they massacred my boy☆63Updated last year
- ☆62Updated 6 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- RWKV-7: Surpassing GPT☆104Updated last year
- Inference of Mamba and Mamba2 models in pure C☆196Updated 2 weeks ago
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated last year
- ☆40Updated last year
- 1.58-bit LLaMa model☆82Updated last year
- llama.cpp to PyTorch Converter☆36Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 3 months ago
- ☆137Updated last year
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆61Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆110Updated 11 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆108Updated last year
- Experimental GPU language with meta-programming☆24Updated last year
- Sparse Inferencing for transformer based LLMs☆218Updated 5 months ago
- An introduction to LLM Sampling☆79Updated last year
- Samples of good AI generated CUDA kernels☆99Updated 8 months ago
- Cerule - A Tiny Mighty Vision Model☆68Updated 2 months ago
- Low-Rank adapter extraction for fine-tuned transformers models☆180Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated this week