mikex86 / LibreCudaLinks
☆1,044Updated 2 months ago
Alternatives and similar repositories for LibreCuda
Users that are interested in LibreCuda are comparing it to the libraries listed below
Sorting:
- ☆449Updated 3 months ago
- NVIDIA Linux open GPU with P2P support☆1,200Updated last month
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆350Updated 3 months ago
- ☆249Updated last year
- Richard is gaining power☆196Updated last month
- ☆188Updated 11 months ago
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆190Updated 6 months ago
- Apple AMX Instruction Set☆1,121Updated 7 months ago
- Nvidia Instruction Set Specification Generator☆285Updated last year
- Algebraic enhancements for GEMM & AI accelerators☆278Updated 5 months ago
- llama3.np is a pure NumPy implementation for Llama 3 model.☆987Updated 3 months ago
- throwaway GPT inference☆140Updated last year
- GGUF implementation in C as a library and a tools CLI program☆277Updated 6 months ago
- Exploring the scalable matrix extension of the Apple M4 processor☆191Updated 8 months ago
- TT-NN operator library, and TT-Metalium low level kernel programming model.☆1,059Updated this week
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆800Updated this week
- GPUOcelot: A dynamic compilation framework for PTX☆204Updated 5 months ago
- port of Andrjey Karpathy's llm.c to Mojo☆353Updated last week
- LLM-powered lossless compression tool☆285Updated 11 months ago
- LLM training in simple, raw C/HIP for AMD GPUs☆50Updated 10 months ago
- Tutorials on tinygrad☆396Updated last month
- Exocompilation for productive programming of hardware accelerators☆650Updated this week
- Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator☆211Updated last year
- A modern model graph visualizer and debugger☆1,293Updated this week
- Solve Puzzles. Learn Metal 🤘☆574Updated 10 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆604Updated this week
- AI Tensor Engine for ROCm☆243Updated this week
- Tenstorrent TT-BUDA Repository☆314Updated 4 months ago
- Tile primitives for speedy kernels☆2,541Updated this week
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆604Updated 5 months ago