mikex86 / LibreCudaLinks
☆1,041Updated last month
Alternatives and similar repositories for LibreCuda
Users that are interested in LibreCuda are comparing it to the libraries listed below
Sorting:
- NVIDIA Linux open GPU with P2P support☆1,175Updated 2 weeks ago
- ☆447Updated 2 months ago
- ☆248Updated last year
- Nvidia Instruction Set Specification Generator☆278Updated 11 months ago
- Algebraic enhancements for GEMM & AI accelerators☆277Updated 3 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆352Updated 2 months ago
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆782Updated this week
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆181Updated 5 months ago
- ☆187Updated 9 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆194Updated 4 months ago
- Tile primitives for speedy kernels☆2,465Updated this week
- Tutorials on tinygrad☆385Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆608Updated this week
- Richard is gaining power☆189Updated this week
- Docker-based inference engine for AMD GPUs☆231Updated 8 months ago
- llama3.np is a pure NumPy implementation for Llama 3 model.☆984Updated last month
- JSON for Classic C++☆726Updated 6 months ago
- VS Code extension for LLM-assisted code/text completion☆807Updated this week
- Apple AMX Instruction Set☆1,093Updated 5 months ago
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆211Updated last year
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,407Updated 2 weeks ago
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆576Updated last week
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆1,322Updated this week
- GGUF implementation in C as a library and a tools CLI program☆273Updated 5 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆178Updated 7 months ago
- throwaway GPT inference☆140Updated last year
- HIPIFY: Convert CUDA to Portable C++ Code☆587Updated this week
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆592Updated 4 months ago
- ☆196Updated last month
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆284Updated this week