mikex86 / LibreCudaLinks
☆1,072Updated 7 months ago
Alternatives and similar repositories for LibreCuda
Users that are interested in LibreCuda are comparing it to the libraries listed below
Sorting:
- ☆449Updated 8 months ago
- NVIDIA Linux open GPU with P2P support☆1,299Updated 6 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆370Updated 7 months ago
- Apple AMX Instruction Set☆1,173Updated 11 months ago
- ☆191Updated last year
- ☆249Updated last year
- Richard is gaining power☆200Updated 5 months ago
- Nvidia Instruction Set Specification Generator☆304Updated last year
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆204Updated 11 months ago
- Fast and Furious AMD Kernels☆321Updated this week
- llama3.np is a pure NumPy implementation for Llama 3 model.☆992Updated 7 months ago
- Algebraic enhancements for GEMM & AI accelerators☆282Updated 9 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆213Updated last year
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,279Updated this week
- Tile primitives for speedy kernels☆3,008Updated last week
- GGUF implementation in C as a library and a tools CLI program☆296Updated 3 months ago
- Llama 2 Everywhere (L2E)☆1,522Updated 3 months ago
- throwaway GPT inference☆141Updated last year
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆215Updated 2 years ago
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,604Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆637Updated this week
- Exocompilation for productive programming of hardware accelerators☆693Updated last week
- Docker-based inference engine for AMD GPUs☆230Updated last year
- LLM training in simple, raw C/HIP for AMD GPUs☆56Updated last year
- A modern model graph visualizer and debugger☆1,349Updated last week
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 10 months ago
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆442Updated last year
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆569Updated 2 years ago
- CUDA/Metal accelerated language model inference☆625Updated 6 months ago
- Open weights language model from Google DeepMind, based on Griffin.☆656Updated 6 months ago