KhoomeiK / complexity-scalingLinks

gzip Predicts Data-dependent Scaling Laws

☆34

Alternatives and similar repositories for complexity-scaling

Users that are interested in complexity-scaling are comparing it to the libraries listed below

Sorting:

okarthikb / state-space-models
☆28Updated last year
Aleph-Alpha-Research / scaling
Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…
☆66Updated 2 weeks ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
epfml / DenseFormer
☆82Updated last year
NousResearch / StripedHyenaTrainer
☆62Updated last year
google-deepmind / mishax
☆144Updated 3 months ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆87Updated 3 years ago
dvruette / barrel-rec-pytorch
☆53Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
EleutherAI / improved-t5
Experiments for efforts to train a new and improved t5
☆76Updated last year
taufeeque9 / codebook-features
Sparse and discrete interpretability tool for neural networks
☆64Updated last year
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆112Updated last month
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆195Updated last year
shikaiqiu / compute-better-spent
☆62Updated last year
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆243Updated 2 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
cloneofsimo / min-fsdp
☆91Updated last year
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Updated 2 years ago
abacaj / train-with-fsdp
☆94Updated 2 years ago
johnma2006 / candle
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
☆53Updated last year
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆92Updated last year
google-deepmind / neural_networks_solomonoff_induction
Learning Universal Predictors
☆81Updated last year
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆168Updated 10 months ago
google-deepmind / asyncdiloco
☆47Updated last year
berlino / seq_icl
☆53Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆105Updated 4 months ago
srush / do-we-need-attention
☆166Updated 2 years ago
ClashLuke / SOAP
☆21Updated last year
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 9 months ago
EleutherAI / features-across-time
Understanding how features learned by neural networks evolve throughout training
☆39Updated last year