t4minka / ccml
☆61Updated this week
Related projects: ⓘ
- Fast multi-threaded matrix multiplication in C☆164Updated 3 weeks ago
- LLM training in simple, raw C/CUDA☆79Updated 4 months ago
- Gpu benchmark☆35Updated 2 weeks ago
- Inference of Mamba models in pure C☆176Updated 6 months ago
- Minimal C++ implementation of GPT2☆39Updated last year
- Make triton easier☆39Updated 3 months ago
- asynchronous/distributed speculative evaluation for llama3☆36Updated last month
- C++ raytracer that supports custom models. Supports running the calculations on the CPU using C++11 threads or in the GPU via CUDA.☆74Updated last year
- A collection of Fast Fourier Transform algorithms implemented in C++20.☆107Updated 9 months ago
- throwaway GPT inference☆139Updated 3 months ago
- Inference Llama 2 in C++☆47Updated 4 months ago
- Jax like function transformation engine but micro, microjax☆24Updated 3 weeks ago
- A package for defining deep learning models using categorical algebraic expressions.☆53Updated last month
- Experiments with BitNet inference on CPU☆46Updated 5 months ago
- Attention in SRAM on Tenstorrent Grayskull☆22Updated 2 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆82Updated 3 weeks ago
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆74Updated 8 months ago
- Port of Karpathy's micrograd in pure C. Micrograd is a tiny scalar-valued autograd engine and a neural net library on top of it with PyTo…☆25Updated last month
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.☆21Updated 2 months ago
- Documented and Unit Tested educational Deep Learning framework with Autograd from scratch.☆103Updated 5 months ago
- A tiny deep learning library written in Java☆24Updated last year
- RWKV, in easy to read code☆52Updated 5 months ago
- Simple and readable code for training and sampling from diffusion models☆193Updated last week
- Personal solutions to the Triton Puzzles☆11Updated 2 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆51Updated 7 months ago
- Autograd to GPT-2 completely from scratch☆104Updated last month
- Attempt at Neuralink's Compression Challenge☆85Updated 3 months ago
- LLM inference in Fortran☆54Updated 3 months ago
- Implementation of Spectral State Space Models☆16Updated 6 months ago