tinygrad / open-gpu-kernel-modulesLinks
NVIDIA Linux open GPU with P2P support
☆1,188Updated last month
Alternatives and similar repositories for open-gpu-kernel-modules
Users that are interested in open-gpu-kernel-modules are comparing it to the libraries listed below
Sorting:
- ☆1,043Updated 2 months ago
- Distributed Training Over-The-Internet☆946Updated 2 months ago
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆1,574Updated this week
- Tile primitives for speedy kernels☆2,523Updated this week
- ☆448Updated 3 months ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆870Updated 6 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆845Updated last week
- Serving multiple LoRA finetuned LLM as one☆1,073Updated last year
- llama.cpp fork with additional SOTA quants and improved performance☆686Updated this week
- FlashAttention (Metal Port)☆506Updated 9 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆647Updated 2 months ago
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆792Updated this week
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆516Updated 6 months ago
- ☆547Updated 8 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆859Updated 10 months ago
- Large-scale LLM inference engine☆1,477Updated this week
- Open weights language model from Google DeepMind, based on Griffin.☆644Updated last month
- CUDA/Metal accelerated language model inference☆594Updated last month
- Puzzles for learning Triton☆1,760Updated 8 months ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆446Updated last month
- AI Tensor Engine for ROCm☆232Updated this week
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆620Updated 3 months ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆388Updated last month
- llama3.np is a pure NumPy implementation for Llama 3 model.☆987Updated 2 months ago
- Stop messing around with finicky sampling parameters and just use DRµGS!☆349Updated last year
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,236Updated this week
- Llama 2 Everywhere (L2E)☆1,519Updated 6 months ago
- scalable and robust tree-based speculative decoding algorithm☆349Updated 5 months ago
- ☆248Updated last year
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆781Updated last month