mikex86 / LibreCudaLinks
☆1,066Updated 6 months ago
Alternatives and similar repositories for LibreCuda
Users that are interested in LibreCuda are comparing it to the libraries listed below
Sorting:
- NVIDIA Linux open GPU with P2P support☆1,285Updated 5 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 7 months ago
- ☆447Updated 7 months ago
- ☆248Updated last year
- ☆190Updated last year
- Apple AMX Instruction Set☆1,173Updated 11 months ago
- Richard is gaining power☆198Updated 5 months ago
- Nvidia Instruction Set Specification Generator☆298Updated last year
- llama3.np is a pure NumPy implementation for Llama 3 model.☆993Updated 7 months ago
- Exocompilation for productive programming of hardware accelerators☆683Updated this week
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,583Updated 2 weeks ago
- throwaway GPT inference☆140Updated last year
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆201Updated 10 months ago
- Algebraic enhancements for GEMM & AI accelerators☆282Updated 8 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆211Updated last year
- Fast and Furious AMD Kernels☆298Updated this week
- A modern model graph visualizer and debugger☆1,339Updated this week
- GGUF implementation in C as a library and a tools CLI program☆296Updated 2 months ago
- Felafax is building AI infra for non-NVIDIA GPUs☆569Updated 10 months ago
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆214Updated last year
- Docker-based inference engine for AMD GPUs☆230Updated last year
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆404Updated this week
- Tile primitives for speedy kernels☆2,937Updated last week
- LLM training in simple, raw C/HIP for AMD GPUs☆54Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆216Updated 9 months ago
- Tutorials on tinygrad☆439Updated last month
- An implementation of bucketMul LLM inference☆223Updated last year
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,265Updated this week
- port of Andrjey Karpathy's llm.c to Mojo☆360Updated 3 months ago
- NVIDIA Math Libraries for the Python Ecosystem☆535Updated last week