tinygrad / open-gpu-kernel-modulesLinks
NVIDIA Linux open GPU with P2P support
☆1,251Updated 3 months ago
Alternatives and similar repositories for open-gpu-kernel-modules
Users that are interested in open-gpu-kernel-modules are comparing it to the libraries listed below
Sorting:
- ☆1,055Updated 4 months ago
- Distributed Training Over-The-Internet☆959Updated 4 months ago
- Tile primitives for speedy kernels☆2,704Updated last week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆1,827Updated this week
- ☆450Updated 5 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,198Updated this week
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆534Updated 8 months ago
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆826Updated 4 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆878Updated 2 weeks ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆931Updated 8 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆658Updated 5 months ago
- FlashAttention (Metal Port)☆534Updated last year
- Juice Community Version Public Release☆603Updated 4 months ago
- Serving multiple LoRA finetuned LLM as one☆1,088Updated last year
- ☆555Updated 10 months ago
- Open weights language model from Google DeepMind, based on Griffin.☆651Updated 3 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆900Updated last year
- Puzzles for learning Triton☆1,992Updated 10 months ago
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆624Updated 6 months ago
- Llama 2 Everywhere (L2E)☆1,524Updated 3 weeks ago
- FlashInfer: Kernel Library for LLM Serving☆3,787Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆576Updated last month
- Large-scale LLM inference engine☆1,552Updated this week
- LLM training in simple, raw C/HIP for AMD GPUs☆52Updated last year
- ☆248Updated last year
- Tutorials on tinygrad☆414Updated last month
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,326Updated last month
- TinyChatEngine: On-Device LLM Inference Library☆896Updated last year
- CUDA/Metal accelerated language model inference☆614Updated 3 months ago
- PyTorch native quantization and sparsity for training and inference☆2,375Updated this week