hiverge / cifar10-speedrunLinks

CIFAR-10 speedrun: Trains to 94% accuracy in 1.98 seconds on a single NVIDIA A100 GPU.

☆40

Alternatives and similar repositories for cifar10-speedrun

Users that are interested in cifar10-speedrun are comparing it to the libraries listed below

Sorting:

HazyResearch / train-tk
train with kittens!
☆63Updated last year
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆91Updated 2 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆132Updated 10 months ago
yixiaoer / mistral-v0.2-jax
JAX implementation of the Mistral 7b v0.2 model
☆34Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆103Updated 3 months ago
modula-systems / modula
🧱 Modula software package
☆300Updated 2 months ago
m-a-n-i-f-e-s-t / power-attention
Attention Kernels for Symmetric Power Transformers
☆123Updated last month
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆193Updated last month
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆193Updated last year
enjalot / latent-sae
Training code for Sparse Autoencoders on Embedding models
☆38Updated 8 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆171Updated 4 months ago
srush / mamba-primer
☆38Updated last year
yixiaoer / einshard
Einsum-like high-level array sharding API for JAX
☆34Updated last year
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆107Updated 8 months ago
KellerJordan / hlb-CIFAR10
Train to 94% on CIFAR-10 in 4.4 seconds on a single A100
☆12Updated last year
srush / do-we-need-attention
☆166Updated 2 years ago
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆138Updated last month
cgarciae / einop
☆60Updated 3 years ago
google-deepmind / nanodo
☆283Updated last year
nikhilvyas / SOAP
☆221Updated 11 months ago
srush / Tensor-Puzzles-Penzai
☆21Updated last year
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆139Updated last year
KhoomeiK / complexity-scaling
gzip Predicts Data-dependent Scaling Laws
☆34Updated last year
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆33Updated last year
clement-bonnet / lpn
Latent Program Network (from the "Searching Latent Program Spaces" paper)
☆102Updated last month
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
okarthikb / state-space-models
☆28Updated last year
divyamakkar0 / JAXformer
A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.
☆104Updated last month
iliao2345 / CompressARC
☆198Updated 2 months ago
Silent-Zebra / twisted-smc-lm
☆32Updated 7 months ago