tinygrad / open-gpu-kernel-modulesLinks
NVIDIA Linux open GPU with P2P support
☆1,316Updated 8 months ago
Alternatives and similar repositories for open-gpu-kernel-modules
Users that are interested in open-gpu-kernel-modules are comparing it to the libraries listed below
Sorting:
- ☆1,074Updated 8 months ago
- ☆451Updated 10 months ago
- Distributed Training Over-The-Internet☆975Updated 3 months ago
- Tile primitives for speedy kernels☆3,120Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)☆912Updated last month
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆1,005Updated last year
- llama.cpp fork with additional SOTA quants and improved performance☆1,587Updated this week
- Flash Attention in ~100 lines of CUDA (forward pass only)☆1,067Updated last year
- ☆577Updated last year
- FlashAttention (Metal Port)☆579Updated last year
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,104Updated last week
- Large-scale LLM inference engine☆1,641Updated 2 weeks ago
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆562Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,140Updated last year
- 🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantiza…☆839Updated this week
- Fast and memory-efficient exact attention☆213Updated last week
- CUDA/Metal accelerated language model inference☆626Updated 8 months ago
- TinyChatEngine: On-Device LLM Inference Library☆941Updated last year
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆674Updated 9 months ago
- NVIDIA Linux open GPU with P2P support☆126Updated 2 months ago
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆109Updated 9 months ago
- AI Tensor Engine for ROCm☆348Updated this week
- Open weights language model from Google DeepMind, based on Griffin.☆661Updated 2 weeks ago
- LLM training in simple, raw C/HIP for AMD GPUs☆58Updated last year
- Llama 2 Everywhere (L2E)☆1,526Updated 5 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆626Updated last week
- ☆250Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,563Updated 10 months ago
- Low-bit LLM inference on CPU/NPU with lookup table☆916Updated 8 months ago
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆850Updated 2 months ago