mikex86 / LibreCudaLinks
☆1,043Updated last month
Alternatives and similar repositories for LibreCuda
Users that are interested in LibreCuda are comparing it to the libraries listed below
Sorting:
- ☆448Updated 3 months ago
- NVIDIA Linux open GPU with P2P support☆1,186Updated last month
- ☆188Updated 10 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆350Updated 2 months ago
- Nvidia Instruction Set Specification Generator☆280Updated last year
- ☆248Updated last year
- Algebraic enhancements for GEMM & AI accelerators☆277Updated 4 months ago
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆637Updated last month
- llama3.np is a pure NumPy implementation for Llama 3 model.☆986Updated 2 months ago
- Richard is gaining power☆192Updated 3 weeks ago
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,430Updated last week
- Apple AMX Instruction Set☆1,098Updated 6 months ago
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆790Updated this week
- Docker-based inference engine for AMD GPUs☆231Updated 9 months ago
- Felafax is building AI infra for non-NVIDIA GPUs☆566Updated 5 months ago
- Minimal LLM inference in Rust☆1,003Updated 8 months ago
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆187Updated 6 months ago
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆414Updated last year
- Exploring the scalable matrix extension of the Apple M4 processor☆184Updated 8 months ago
- throwaway GPT inference☆140Updated last year
- GGUF implementation in C as a library and a tools CLI program☆274Updated 6 months ago
- Tile primitives for speedy kernels☆2,517Updated this week
- Distributed Training Over-The-Internet☆945Updated 2 months ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆991Updated this week
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆594Updated 4 months ago
- JSON for Classic C++☆730Updated 7 months ago
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆1,540Updated last week
- Exocompilation for productive programming of hardware accelerators☆640Updated this week
- Solve puzzles to improve your tinygrad skills!☆135Updated 4 months ago
- VS Code extension for LLM-assisted code/text completion☆835Updated last week