mlecauchois / micrograd-cudaLinks

☆249

Alternatives and similar repositories for micrograd-cuda

Users that are interested in micrograd-cuda are comparing it to the libraries listed below

Sorting:

a1k0n / a1gpt
throwaway GPT inference
☆140Updated last year
valine / training-hot-swap
Pytorch script hot swap: Change code without unloading your LLM from VRAM
☆126Updated 3 months ago
joennlae / tensorli
Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).
☆253Updated last year
DiscoGrad / DiscoGrad
DiscoGrad - automatically differentiate across conditional branches in C++ programs
☆204Updated 10 months ago
ivanbelenky / RL
R.L. methods and techniques.
☆199Updated 8 months ago
trevorpogue / algebraic-nnhw
Algebraic enhancements for GEMM & AI accelerators
☆278Updated 5 months ago
samvher / bert-for-laptops
A BERT that you can train on a (gaming) laptop.
☆209Updated last year
robjinman / richard
Richard is gaining power
☆196Updated last month
PaulPauls / llama3_interpretability_sae
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…
☆622Updated 4 months ago
idoh / mamba.np
A pure NumPy implementation of Mamba.
☆223Updated last year
kolinko / effort
An implementation of bucketMul LLM inference
☆221Updated last year
nirw4nna / dsc
Tensor library & inference framework for machine learning
☆106Updated 3 weeks ago
yousef-rafat / miniDiffusion
A reimplementation of Stable Diffusion 3.5 in pure PyTorch
☆652Updated last month
adamkarvonen / chess_llm_interpretability
Visualizing the internal board state of a GPT trained on chess PGN strings, and performing interventions on its internal board state and …
☆208Updated 8 months ago
anordin95 / run-llama-locally
Run and explore Llama models locally with minimal dependencies on CPU
☆191Updated 9 months ago
joennlae / halutmatmul
Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator
☆211Updated last year
rentruewang / bocoel
Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few l…
☆286Updated last month
salykova / sgemm.c
Multi-Threaded FP32 Matrix Multiplication on x86 CPUs
☆350Updated 3 months ago
google-deepmind / searchless_chess
Grandmaster-Level Chess Without Search
☆585Updated 6 months ago
facebookresearch / searchformer
Official codebase for the paper "Beyond A* Better Planning with Transformers via Search Dynamics Bootstrapping".
☆372Updated last year
slashml / amd_inference
Docker-based inference engine for AMD GPUs
☆231Updated 9 months ago
felafax / felafax
Felafax is building AI infra for non-NVIDIA GPUs
☆566Updated 6 months ago
Foreseerr / TScale
☆196Updated 3 months ago
ScalingIntelligence / tokasaurus
☆388Updated last week
Cerebras / gigaGPT
a small code base for training large models
☆307Updated 3 months ago
google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆645Updated 2 months ago
joelburget / microjax
A tiny autograd engine with a Jax-like API
☆74Updated last month
bclarkson-code / Tricycle
Autograd to GPT-2 completely from scratch
☆115Updated 3 months ago
valine / NeuralFlow
Visualize the intermediate output of Mistral 7B
☆367Updated 6 months ago
AMD-AIG-AIMA / AMD-LLM
☆188Updated 11 months ago