tinygrad / open-gpu-kernel-modulesLinks
NVIDIA Linux open GPU with P2P support
☆1,263Updated 4 months ago
Alternatives and similar repositories for open-gpu-kernel-modules
Users that are interested in open-gpu-kernel-modules are comparing it to the libraries listed below
Sorting:
- ☆1,057Updated 5 months ago
- Tile primitives for speedy kernels☆2,821Updated last week
- Distributed Training Over-The-Internet☆961Updated this week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆1,891Updated this week
- ☆448Updated 6 months ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆945Updated 9 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆883Updated last month
- llama.cpp fork with additional SOTA quants and improved performance☆1,258Updated this week
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆659Updated 5 months ago
- FlashAttention (Metal Port)☆542Updated last year
- ☆559Updated 11 months ago
- TinyChatEngine: On-Device LLM Inference Library☆903Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,101Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆911Updated last year
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆537Updated 9 months ago
- Puzzles for learning Triton☆2,036Updated 11 months ago
- Large-scale LLM inference engine☆1,567Updated last week
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆548Updated last month
- Fast and memory-efficient exact attention☆193Updated this week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU.☆668Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆578Updated 2 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,548Updated 6 months ago
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,163Updated last year
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆104Updated 5 months ago
- kernels, of the mega variety☆586Updated 3 weeks ago
- ☆248Updated last year
- FlashInfer: Kernel Library for LLM Serving☆3,911Updated this week
- LLM training in simple, raw C/HIP for AMD GPUs☆51Updated last year
- CUDA/Metal accelerated language model inference☆615Updated 4 months ago
- scalable and robust tree-based speculative decoding algorithm☆359Updated 8 months ago