t4minka / ccml

☆61

Related projects: ⓘ

salykova / matmul.c
Fast multi-threaded matrix multiplication in C
☆164Updated 3 weeks ago
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆79Updated 4 months ago
mag- / gpu_benchmark
Gpu benchmark
☆35Updated 2 weeks ago
kroggen / mamba.c
Inference of Mamba models in pure C
☆176Updated 6 months ago
davidar / eigenGPT
Minimal C++ implementation of GPT2
☆39Updated last year
UmerHA / triton_util
Make triton easier
☆39Updated 3 months ago
okuvshynov / llama_duo
asynchronous/distributed speculative evaluation for llama3
☆36Updated last month
maxilevi / raytracer
C++ raytracer that supports custom models. Supports running the calculations on the CPU using C++11 threads or in the GPU via CUDA.
☆74Updated last year
bisqwit / fft
A collection of Fast Fourier Transform algorithms implemented in C++20.
☆107Updated 9 months ago
a1k0n / a1gpt
throwaway GPT inference
☆139Updated 3 months ago
AmeyaWagh / llama2.cpp
Inference Llama 2 in C++
☆47Updated 4 months ago
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆24Updated 3 weeks ago
vtabbott / Algebraic-NCD
A package for defining deep learning models using categorical algebraic expressions.
☆53Updated last month
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆46Updated 5 months ago
moritztng / grayskull-attention
Attention in SRAM on Tenstorrent Grayskull
☆22Updated 2 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆82Updated 3 weeks ago
glassroom / heinsen_sequence
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
☆74Updated 8 months ago
Jaykef / micrograd.c
Port of Karpathy's micrograd in pure C. Micrograd is a tiny scalar-valued autograd engine and a neural net library on top of it with PyTo…
☆25Updated last month
mobiusml / gemlite
Simple and fast low-bit matmul kernels in CUDA
☆48Updated this week
shivance / minbpe.c
a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
☆21Updated 2 months ago
eduardoleao052 / Autograd-from-scratch
Documented and Unit Tested educational Deep Learning framework with Autograd from scratch.
☆103Updated 5 months ago
mikex86 / scicore
A tiny deep learning library written in Java
☆24Updated last year
SmerkyG / RWKV_Explained
RWKV, in easy to read code
☆52Updated 5 months ago
yuanchenyang / smalldiffusion
Simple and readable code for training and sampling from diffusion models
☆193Updated last week
alexzhang13 / Triton-Puzzles-Solutions
Personal solutions to the Triton Puzzles
☆11Updated 2 months ago
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆51Updated 7 months ago
bclarkson-code / Tricycle
Autograd to GPT-2 completely from scratch
☆104Updated last month
phoboslab / neuralink_brainwire
Attempt at Neuralink's Compression Challenge
☆85Updated 3 months ago
rbitr / llm.f90
LLM inference in Fortran
☆54Updated 3 months ago
catid / spectral_ssm
Implementation of Spectral State Space Models
☆16Updated 6 months ago