mikex86 / LibreCuda
☆1,034Updated 4 months ago
Alternatives and similar repositories for LibreCuda:
Users that are interested in LibreCuda are comparing it to the libraries listed below
- ☆186Updated 7 months ago
- NVIDIA Linux open GPU with P2P support☆1,094Updated 4 months ago
- ☆242Updated last year
- Tile primitives for speedy kernels☆2,259Updated this week
- Felafax is building AI infra for non-NVIDIA GPUs☆558Updated 2 months ago
- Richard is gaining power☆184Updated 4 months ago
- Algebraic enhancements for GEMM & AI accelerators☆274Updated last month
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆347Updated this week
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆742Updated 2 weeks ago
- ☆441Updated 2 weeks ago
- Apple AMX Instruction Set☆1,069Updated 3 months ago
- Nvidia Instruction Set Specification Generator☆255Updated 9 months ago
- Docker-based inference engine for AMD GPUs☆230Updated 6 months ago
- llama3.np is a pure NumPy implementation for Llama 3 model.☆981Updated 10 months ago
- GGUF implementation in C as a library and a tools CLI program☆264Updated 3 months ago
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,697Updated last week
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆282Updated 3 months ago
- An implementation of bucketMul LLM inference☆216Updated 9 months ago
- JSON for Classic C++☆712Updated 4 months ago
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆209Updated last year
- Flash Attention in ~100 lines of CUDA (forward pass only)☆779Updated 3 months ago
- Apple GPU microarchitecture☆515Updated 6 months ago
- Distributed Training Over-The-Internet☆901Updated 4 months ago
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆143Updated 3 months ago
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,755Updated last week
- throwaway GPT inference☆138Updated 10 months ago
- LLM-powered lossless compression tool☆279Updated 8 months ago
- Implementing DeepSeek R1's GRPO algorithm from scratch☆445Updated this week
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,319Updated 3 weeks ago
- Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception hand…☆516Updated last week