mikex86 / LibreCuda
☆1,026Updated 3 months ago
Alternatives and similar repositories for LibreCuda:
Users that are interested in LibreCuda are comparing it to the libraries listed below
- Apple AMX Instruction Set☆1,054Updated 2 months ago
- NVIDIA Linux open GPU with P2P support☆1,044Updated 2 months ago
- ☆186Updated 6 months ago
- ☆432Updated 3 months ago
- ☆242Updated 11 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆340Updated 3 weeks ago
- Nvidia Instruction Set Specification Generator☆254Updated 8 months ago
- JSON for Classic C++☆701Updated 3 months ago
- Algebraic enhancements for GEMM & AI accelerators☆265Updated last week
- Docker-based inference engine for AMD GPUs☆229Updated 5 months ago
- Richard is gaining power☆184Updated 3 months ago
- Vim plugin for LLM-assisted code/text completion☆1,239Updated this week
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆717Updated this week
- Felafax is building AI infra for non-NVIDIA GPUs☆555Updated last month
- Minimal LLM inference in Rust☆977Updated 4 months ago
- GGUF implementation in C as a library and a tools CLI program☆260Updated 2 months ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆534Updated last week
- A minimalistic C++ Jinja templating engine for LLM chat templates☆126Updated this week
- Tile primitives for speedy kernels☆2,130Updated this week
- Exploring the scalable matrix extension of the Apple M4 processor☆165Updated 4 months ago
- Apple GPU microarchitecture☆502Updated 5 months ago
- VS Code extension for LLM-assisted code/text completion☆586Updated this week
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,834Updated this week
- Inference Llama models in one file of pure C for Windows 98 running on 25-year-old hardware☆248Updated 2 months ago
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆256Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆202Updated this week
- ☆518Updated 11 months ago
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,276Updated 2 weeks ago