mikex86 / LibreCudaLinks
☆1,040Updated 2 weeks ago
Alternatives and similar repositories for LibreCuda
Users that are interested in LibreCuda are comparing it to the libraries listed below
Sorting:
- Nvidia Instruction Set Specification Generator☆267Updated 10 months ago
- NVIDIA Linux open GPU with P2P support☆1,155Updated this week
- ☆443Updated last month
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated last month
- llama3.np is a pure NumPy implementation for Llama 3 model.☆981Updated last month
- Apple AMX Instruction Set☆1,086Updated 5 months ago
- Docker-based inference engine for AMD GPUs☆230Updated 7 months ago
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,381Updated 3 weeks ago
- ☆243Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆192Updated 3 months ago
- ☆187Updated 9 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆176Updated 6 months ago
- Algebraic enhancements for GEMM & AI accelerators☆276Updated 3 months ago
- Tile primitives for speedy kernels☆2,420Updated this week
- Richard is gaining power☆187Updated 6 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆585Updated this week
- Solve Puzzles. Learn Metal 🤘☆552Updated 8 months ago
- Llama 2 Everywhere (L2E)☆1,517Updated 4 months ago
- An implementation of bucketMul LLM inference☆217Updated 11 months ago
- A modern model graph visualizer and debugger☆1,212Updated this week
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆150Updated 4 months ago
- Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and …☆1,357Updated this week
- throwaway GPT inference☆139Updated last year
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,724Updated last month
- Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception ha…☆1,778Updated 2 weeks ago
- VS Code extension for LLM-assisted code/text completion☆774Updated this week
- port of Andrjey Karpathy's llm.c to Mojo☆352Updated 5 months ago
- ☆192Updated 3 weeks ago
- Felafax is building AI infra for non-NVIDIA GPUs☆561Updated 4 months ago
- Exocompilation for productive programming of hardware accelerators☆607Updated last week