mikex86 / LibreCuda
☆1,030Updated 4 months ago
Alternatives and similar repositories for LibreCuda:
Users that are interested in LibreCuda are comparing it to the libraries listed below
- NVIDIA Linux open GPU with P2P support☆1,063Updated 3 months ago
- Docker-based inference engine for AMD GPUs☆230Updated 5 months ago
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,294Updated last month
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆734Updated last week
- ☆186Updated 7 months ago
- ☆438Updated 2 weeks ago
- ☆242Updated last year
- Vim plugin for LLM-assisted code/text completion☆1,304Updated 2 weeks ago
- Nvidia Instruction Set Specification Generator☆253Updated 8 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆342Updated last month
- Algebraic enhancements for GEMM & AI accelerators☆274Updated last month
- Tile primitives for speedy kernels☆2,184Updated this week
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,682Updated 2 weeks ago
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆142Updated 2 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆169Updated 4 months ago
- Things you can do with the token embeddings of an LLM☆1,431Updated this week
- Apple AMX Instruction Set☆1,059Updated 3 months ago
- Richard is gaining power☆184Updated 4 months ago
- llama3.np is a pure NumPy implementation for Llama 3 model.☆977Updated 9 months ago
- GGUF implementation in C as a library and a tools CLI program☆262Updated 2 months ago
- Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆1,993Updated this week
- A modern model graph visualizer and debugger☆1,152Updated this week
- An implementation of bucketMul LLM inference☆215Updated 8 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆565Updated this week
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆604Updated last week
- LLM-powered lossless compression tool☆274Updated 7 months ago
- Minimal LLM inference in Rust☆980Updated 5 months ago
- Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild☆2,173Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆222Updated this week
- Solve Puzzles. Learn Metal 🤘☆544Updated 6 months ago