mikex86 / LibreCudaLinks
☆1,074Updated 8 months ago
Alternatives and similar repositories for LibreCuda
Users that are interested in LibreCuda are comparing it to the libraries listed below
Sorting:
- ☆451Updated 9 months ago
- NVIDIA Linux open GPU with P2P support☆1,316Updated 7 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆376Updated 9 months ago
- ☆250Updated last year
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆205Updated last year
- ☆191Updated last year
- Nvidia Instruction Set Specification Generator☆310Updated last year
- Algebraic enhancements for GEMM & AI accelerators☆286Updated 11 months ago
- llama3.np is a pure NumPy implementation for Llama 3 model.☆992Updated 9 months ago
- Richard is gaining power☆200Updated 7 months ago
- throwaway GPT inference☆141Updated last year
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,800Updated 3 weeks ago
- Apple AMX Instruction Set☆1,190Updated last year
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,645Updated this week
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆804Updated 2 weeks ago
- Exploring the scalable matrix extension of the Apple M4 processor☆219Updated last year
- GPUOcelot: A dynamic compilation framework for PTX☆219Updated 11 months ago
- GGUF implementation in C as a library and a tools CLI program☆301Updated 5 months ago
- Fast and Furious AMD Kernels☆346Updated last week
- A modern model graph visualizer and debugger☆1,376Updated last week
- Exocompilation for productive programming of hardware accelerators☆706Updated last week
- LLM training in simple, raw C/HIP for AMD GPUs☆57Updated last year
- Llama 2 Everywhere (L2E)☆1,527Updated 5 months ago
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆215Updated 2 years ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,332Updated this week
- CUDA/Metal accelerated language model inference☆625Updated 8 months ago
- An implementation of bucketMul LLM inference☆224Updated last year
- Distributed Training Over-The-Internet☆975Updated 3 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆653Updated this week