KhoomeiK / complexity-scalingLinks
gzip Predicts Data-dependent Scaling Laws
☆34Updated last year
Alternatives and similar repositories for complexity-scaling
Users that are interested in complexity-scaling are comparing it to the libraries listed below
Sorting:
- ☆62Updated last year
- ☆142Updated 3 weeks ago
- ☆28Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆129Updated 9 months ago
- ☆69Updated last year
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆65Updated this week
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆193Updated last year
- ☆82Updated last year
- Experiments for efforts to train a new and improved t5☆76Updated last year
- Understanding how features learned by neural networks evolve throughout training☆39Updated 11 months ago
- ☆29Updated last year
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆32Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆164Updated 3 months ago
- ☆53Updated last year
- ☆40Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Updated 2 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆82Updated 3 years ago
- Your favourite classical machine learning algos on the GPU/TPU☆20Updated 9 months ago
- A set of Python scripts that makes your experience on TPU better☆54Updated 2 weeks ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- nanoGPT-like codebase for LLM training☆107Updated 4 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- Simple GRPO scripts and configurations.☆59Updated 8 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆84Updated 11 months ago
- ☆101Updated 9 months ago
- JAX implementation of the Mistral 7b v0.2 model☆36Updated last year
- Learning Universal Predictors☆79Updated last year
- ☆89Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆80Updated 10 months ago