tinygrad / open-gpu-kernel-modules
NVIDIA Linux open GPU with P2P support
☆903Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for open-gpu-kernel-modules
- Tile primitives for speedy kernels☆1,643Updated this week
- Distributed Training Over-The-Internet☆683Updated 2 months ago
- ☆998Updated 3 weeks ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆615Updated 7 months ago
- An implementation of bucketMul LLM inference☆214Updated 4 months ago
- ☆383Updated this week
- Open weights language model from Google DeepMind, based on Griffin.☆607Updated 4 months ago
- ☆235Updated 7 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆697Updated last week
- Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and in…☆1,480Updated 3 weeks ago
- FlashAttention (Metal Port)☆382Updated last month
- Stateful load balancer custom-tailored for llama.cpp☆557Updated last week
- llama3.np is a pure NumPy implementation for Llama 3 model.☆973Updated 5 months ago
- Llama 2 Everywhere (L2E)☆1,511Updated 2 weeks ago
- nanoGPT style version of Llama 3.1☆1,231Updated 3 months ago
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆732Updated this week
- Puzzles for learning Triton☆1,089Updated last month
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,190Updated this week
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆971Updated this week
- Stop messing around with finicky sampling parameters and just use DRµGS!☆317Updated 5 months ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆305Updated 5 months ago
- ☆634Updated 2 weeks ago
- Minimal LLM inference in Rust☆915Updated 2 weeks ago
- Serving multiple LoRA finetuned LLM as one☆979Updated 6 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆479Updated 2 weeks ago
- NanoGPT (124M) quality in 7.8 8xH100-minutes☆965Updated this week
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆601Updated last week
- Large-scale LLM inference engine☆1,111Updated this week
- Finetune llama2-70b and codellama on MacBook Air without quantization☆447Updated 7 months ago
- Felafax is building AI infra for non-NVIDIA GPUs☆503Updated last week