Laz4rz/mup

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Laz4rz/mup)

Laz4rz / mup

Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation

☆14

Alternatives and similar repositories for mup

Users that are interested in mup are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cloneofsimo / project_RF
View on GitHub
☆24Jun 4, 2024Updated 2 years ago
PeterWang512 / AttributeByUnlearning
View on GitHub
Code for the paper "Data Attribution for Text-to-Image Models by Unlearning Synthesized Images."
☆17May 23, 2025Updated last year
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
cloneofsimo / min-max-in-dit
View on GitHub
☆27May 3, 2024Updated 2 years ago
alexjc / nanogpt-speedrun
View on GitHub
NanoGPT (124M) in 5 minutes
☆16Feb 14, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GSYfate / knnlm-limits
View on GitHub
Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"
☆24Apr 30, 2025Updated last year
misko / human_descent
View on GitHub
☆41Dec 12, 2025Updated 7 months ago
cloneofsimo / ezmup
View on GitHub
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
☆88Jul 28, 2024Updated last year
hao-ai-lab / cse234-w25-PA
View on GitHub
☆52Mar 14, 2025Updated last year
sail-sg / D-TRAK
View on GitHub
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
☆39Jan 23, 2024Updated 2 years ago
hao-ai-lab / cse234-w25
View on GitHub
Website for CSE 234, Winter 2025
☆16Mar 24, 2025Updated last year
wozeparrot / tinygrad-on-esp32
View on GitHub
run tinygrad kernels on esp32
☆14Nov 28, 2023Updated 2 years ago
AIS-Bonn / ExtrCamCalib_PersonKeypoints
View on GitHub
Official implementation of: "Online Marker-free Extrinsic Camera Calibration using Person Keypoint Detections" by Pätzold, Bultmann & Beh…
☆23Feb 1, 2024Updated 2 years ago
brownirl / rlang
View on GitHub
A Declarative Language for Expressing Partial World Knowledge to Reinforcement Learning Agents
☆17Jan 19, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AI4Bharat / IndicSUPERB
View on GitHub
☆15Apr 9, 2025Updated last year
JesseFarebro / flax-mup
View on GitHub
Maximal Update Parametrization (μP) with Flax & Optax.
☆16Dec 27, 2023Updated 2 years ago
antiguru / flatcontainer
View on GitHub
A flat container abstraction for Rust
☆17Nov 24, 2025Updated 8 months ago
brendel-group / compositional-ood-generalization
View on GitHub
Official code for the paper "Compositional Generalization from First Principles" (NeurIPS 2023)
☆15Jul 25, 2023Updated 3 years ago
tianyu139 / tangent-model-composition
View on GitHub
Code for Tangent Model Composition for Ensembling and Continual Fine-tuning (ICCV 2023) and Tangent Transformers for Composition, Privacy…
☆14May 14, 2024Updated 2 years ago
belindal / state-tracking
View on GitHub
Code and data for paper "(How) do Language Models Track State?"
☆26Mar 31, 2025Updated last year
sharmaeklavya2 / croupier
View on GitHub
Cross-connect stdin and stdout of 2 processes and show outputs from each. (No longer maintained)
☆16Nov 18, 2020Updated 5 years ago
Laz4rz / matryoshka
View on GitHub
Implementation of "Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions"
☆25Aug 27, 2024Updated last year
kyrie-23 / linear_task_arithmetic
View on GitHub
☆12Jul 30, 2025Updated 11 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
dvruette / gidd-easydel
View on GitHub
☆25Dec 16, 2025Updated 7 months ago
Freshwind-Bioinformatics / TABR-BERT
View on GitHub
TABR-BERT: an Accurate and Robust BERT-based Transfer Learning Model for TCR-pMHC Interaction Prediction
☆12Jul 19, 2024Updated 2 years ago
cfpark00 / concept-learning
View on GitHub
Concept Learning Dynamics
☆17Oct 29, 2024Updated last year
maragraziani / concept_discovery_svd
View on GitHub
Automatic identification of regions in the latent space of a model that correspond to unique concepts, namely to concepts with a semantic…
☆14Nov 22, 2023Updated 2 years ago
grapheo12 / iqps
View on GitHub
Web-app meant for qp.metakgp.org
☆21Dec 8, 2022Updated 3 years ago
fionn / feynman
View on GitHub
Calculate allowed interactions in QED
☆10Nov 2, 2022Updated 3 years ago
cloneofsimo / minSAE
View on GitHub
☆30Dec 2, 2024Updated last year
google-deepmind / nanodo
View on GitHub
☆304Jul 15, 2024Updated 2 years ago
tcapelle / torch_moving_mnist
View on GitHub
A simple Dataset generator for Moving Mnist
☆14May 26, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
whistlebee / pytorch-hosvd
View on GitHub
Higher Order SVD implementation in PyTorch
☆13Nov 14, 2022Updated 3 years ago
SilentView / EMCID
View on GitHub
Official Implementation for "Editing Massive Concepts in Text-to-Image Diffusion Models"
☆19Mar 21, 2024Updated 2 years ago
kuterd / opal_ptx
View on GitHub
Experimental GPU language with meta-programming
☆31Sep 6, 2024Updated last year
barthelemymp / TULIP-TCR
View on GitHub
☆14May 15, 2024Updated 2 years ago
anpaure / cp_eval
View on GitHub
Tiny evaluation of leading LLMs on competitive programming problems
☆14Apr 10, 2026Updated 3 months ago
kj3moraes / movieclip
View on GitHub
An experiment with movie scenes and contrastive learning
☆11Feb 1, 2025Updated last year
divyamakkar0 / JAXformer
View on GitHub
A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.
☆127Dec 29, 2025Updated 6 months ago