Shrimp-AI / shrimpgradLinks
Yet another tensor library
☆11Updated last year
Alternatives and similar repositories for shrimpgrad
Users that are interested in shrimpgrad are comparing it to the libraries listed below
Sorting:
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆377Updated 9 months ago
- Tutorials about tinygrad, an end-to-end deep learning stack☆89Updated 2 weeks ago
- Library for algorithmic trading☆68Updated 2 years ago
- A floating point arithmetic which works with types of any mantissa, exponent or base in modern header-only C++.☆83Updated last year
- C++ template metaprogram driven tensor math library☆90Updated 2 weeks ago
- throwaway GPT inference☆141Updated last year
- codes documenting my gpu learning journey☆77Updated last month
- Tensor library & inference framework for machine learning☆117Updated 4 months ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆674Updated last week
- My curated collection of great sources of information for programming☆107Updated 3 months ago
- Quantized LLM training in pure CUDA/C++.☆238Updated 3 weeks ago
- Simple MPI implementation for prototyping or learning☆300Updated 6 months ago
- TransformerCPP is a minimal C++ machine learning library with autograd and tensor ops, inspired by PyTorch. It includes a from-scratch Tr…☆47Updated 3 months ago
- Learning about CUDA by writing PTX code.☆152Updated last year
- ☆251Updated last year
- pytorch from scratch in pure C/CUDA and python☆41Updated last year
- LLM training in simple, raw C/CUDA☆112Updated last year
- C++20 Memory Allocator library☆36Updated 9 months ago
- Experimental alternative to sender/receivers.☆23Updated 3 months ago
- Visualization of cache-optimized matrix multiplication☆157Updated 10 months ago
- A tiny autograd engine with a Jax-like API☆74Updated 7 months ago
- FlameGraphs in Your App☆33Updated last year
- Course notes for Alexander Stepanov's teachings on design and usage of C++ STL.☆84Updated last year
- Exocompilation for productive programming of hardware accelerators☆708Updated this week
- ☆88Updated 2 years ago
- Some CUDA example code with READMEs.☆179Updated 3 months ago
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆182Updated last year
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- minimal DL library in C: 24 NAIVE cuda/cpu ops, autodiff engine, python API (ops bindings/layers/models), tensor abstraction, strides, co…☆56Updated last month
- High Level Algorithmic Skeleton CUDA Library☆30Updated last year