mikex86 / LibreCudaLinks
☆1,055Updated 4 months ago
Alternatives and similar repositories for LibreCuda
Users that are interested in LibreCuda are comparing it to the libraries listed below
Sorting:
- ☆449Updated 6 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆355Updated 5 months ago
- NVIDIA Linux open GPU with P2P support☆1,255Updated 4 months ago
- ☆189Updated last year
- ☆248Updated last year
- Nvidia Instruction Set Specification Generator☆293Updated last year
- Richard is gaining power☆194Updated 3 months ago
- Apple AMX Instruction Set☆1,152Updated 9 months ago
- Algebraic enhancements for GEMM & AI accelerators☆280Updated 7 months ago
- GGUF implementation in C as a library and a tools CLI program☆292Updated last month
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆195Updated 8 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆204Updated 11 months ago
- llama3.np is a pure NumPy implementation for Llama 3 model.☆990Updated 5 months ago
- Open-source LLMOps platform for hosting and scaling AI in your own infrastructure 🏓🦙☆1,315Updated 3 weeks ago
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,513Updated this week
- Docker-based inference engine for AMD GPUs☆230Updated last year
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆673Updated 3 months ago
- Solve Puzzles. Learn Metal 🤘☆587Updated last year
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆213Updated last year
- throwaway GPT inference☆140Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆209Updated 7 months ago
- Reverse engineered Linux driver for the Apple Neural Engine (ANE).☆424Updated last year
- Felafax is building AI infra for non-NVIDIA GPUs☆567Updated 8 months ago
- Exocompilation for productive programming of hardware accelerators☆667Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆625Updated this week
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆369Updated last week
- Llama 2 Everywhere (L2E)☆1,524Updated last month
- Tutorials on tinygrad☆414Updated last week
- A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1☆953Updated last month
- LLM training in simple, raw C/HIP for AMD GPUs☆51Updated last year