salykova / sgemm.cView external linksLinks
Multi-Threaded FP32 Matrix Multiplication on x86 CPUs
☆377Apr 21, 2025Updated 9 months ago
Alternatives and similar repositories for sgemm.c
Users that are interested in sgemm.c are comparing it to the libraries listed below
Sorting:
- High-Performance FP32 GEMM on CUDA devices☆117Jan 21, 2025Updated last year
- Inference of Mamba and Mamba2 models in pure C☆196Jan 22, 2026Updated 3 weeks ago
- a tiny multidimensional array implementation in C similar to numpy, but only one file.☆225Aug 2, 2024Updated last year
- This project records the process of optimizing SGEMM (single-precision floating point General Matrix Multiplication) on the riscv platfor…☆24Dec 11, 2024Updated last year
- Fast CUDA matrix multiplication from scratch☆1,046Sep 2, 2025Updated 5 months ago
- creating a tiny tensor library in raw C☆1,306Mar 5, 2025Updated 11 months ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆350Apr 27, 2025Updated 9 months ago
- ☆16Sep 30, 2021Updated 4 years ago
- Haskell port of the Tensor Algebra COmpiler☆16Nov 18, 2019Updated 6 years ago
- BLAS-like Library Instantiation Software Framework☆2,603Nov 11, 2025Updated 3 months ago
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- Modified Mamba code to run on CPU☆30Jan 14, 2024Updated 2 years ago
- fast probabilistic symmetry detection on graphs☆20Feb 4, 2026Updated last week
- LD_PRELOADable library for exploring the glibc heap☆108Mar 6, 2025Updated 11 months ago
- Inference Llama 2 in one file of pure C☆19,162Aug 6, 2024Updated last year
- Project code for training LLMs to write better unit tests + code☆21May 19, 2025Updated 8 months ago
- Learning about CUDA by writing PTX code.☆152Feb 27, 2024Updated last year
- C++20 Memory Allocator library☆36Apr 30, 2025Updated 9 months ago
- SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT☆807Dec 25, 2025Updated last month
- World's first Nintendo 3DS emulator for Apple devices based on Citra.☆18Apr 7, 2023Updated 2 years ago
- small auto-grad engine inspired from Karpathy's micrograd and PyTorch☆276Nov 21, 2024Updated last year
- llama3.np is a pure NumPy implementation for Llama 3 model.☆991Apr 27, 2025Updated 9 months ago
- A MIPS simulator written in Rust☆19Apr 1, 2019Updated 6 years ago
- Algorithms for matrix matrix multiplication, dgemm, AVX-256, AVX-512☆24Jan 15, 2025Updated last year
- row-major matmul optimization☆701Aug 20, 2025Updated 5 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆251May 6, 2025Updated 9 months ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆83May 22, 2025Updated 8 months ago
- Official codebase for our paper "Do Language Models Use Their Depth Efficiently?"☆29Jun 25, 2025Updated 7 months ago
- Mantella spell mod for Skyrim VR / AE / SE☆16Dec 15, 2025Updated last month
- Musings in GEMM (General Matrix Multiplication)☆14Dec 14, 2025Updated 2 months ago
- C Compiler written in WASI☆11Jun 14, 2020Updated 5 years ago
- GGUF implementation in C as a library and a tools CLI program☆303Aug 28, 2025Updated 5 months ago
- A tool for formally verifying constant-time software against hardware 🕰️☆13Feb 1, 2025Updated last year
- Yet another `llama.cpp` Rust wrapper☆12Jun 19, 2024Updated last year
- Universal instruction selection☆12Jun 8, 2018Updated 7 years ago
- OpenVDB Support for Mitsuba☆11May 26, 2014Updated 11 years ago
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,728Updated this week
- Performance-portable, length-agnostic SIMD with runtime dispatch☆5,317Jan 29, 2026Updated 2 weeks ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆147Feb 4, 2026Updated last week