Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation
☆14Jan 2, 2026Updated 5 months ago
Alternatives and similar repositories for mup
Users that are interested in mup are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆24Jun 4, 2024Updated 2 years ago
- Code for the paper "Data Attribution for Text-to-Image Models by Unlearning Synthesized Images."☆17May 23, 2025Updated last year
- Code for the paper "Function-Space Learning Rates"☆24Jun 3, 2025Updated last year
- ☆27May 3, 2024Updated 2 years ago
- NanoGPT (124M) in 5 minutes☆15Feb 14, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Website for CSE 234, Winter 2025☆15Mar 24, 2025Updated last year
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆24Apr 30, 2025Updated last year
- Code and data for paper "(How) do Language Models Track State?"☆24Mar 31, 2025Updated last year
- ☆41Dec 12, 2025Updated 6 months ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆87Jul 28, 2024Updated last year
- ☆52Mar 14, 2025Updated last year
- Official implementation of: "Online Marker-free Extrinsic Camera Calibration using Person Keypoint Detections" by Pätzold, Bultmann & Beh…☆23Feb 1, 2024Updated 2 years ago
- run tinygrad kernels on esp32☆14Nov 28, 2023Updated 2 years ago
- Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)☆39Jan 23, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Maximal Update Parametrization (μP) with Flax & Optax.☆16Dec 27, 2023Updated 2 years ago
- Official implementation of Categorical Flow Maps on text.☆59Feb 16, 2026Updated 4 months ago
- [ICML 2025] No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces (official repository)☆45Aug 7, 2025Updated 10 months ago
- A flat container abstraction for Rust☆17Nov 24, 2025Updated 6 months ago
- Official code for the paper "Compositional Generalization from First Principles" (NeurIPS 2023)☆15Jul 25, 2023Updated 2 years ago
- ☆13Nov 5, 2024Updated last year
- Implementation of "Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions"☆24Aug 27, 2024Updated last year
- Code for Tangent Model Composition for Ensembling and Continual Fine-tuning (ICCV 2023) and Tangent Transformers for Composition, Privacy…☆14May 14, 2024Updated 2 years ago
- ☆41Jan 12, 2026Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A Declarative Language for Expressing Partial World Knowledge to Reinforcement Learning Agents☆17Jan 19, 2024Updated 2 years ago
- Cross-connect stdin and stdout of 2 processes and show outputs from each. (No longer maintained)☆16Nov 18, 2020Updated 5 years ago
- Help protect against malicious build scripts☆29May 31, 2026Updated 2 weeks ago
- TABR-BERT: an Accurate and Robust BERT-based Transfer Learning Model for TCR-pMHC Interaction Prediction☆12Jul 19, 2024Updated last year
- ☆12Jul 30, 2025Updated 10 months ago
- Automatic identification of regions in the latent space of a model that correspond to unique concepts, namely to concepts with a semantic…☆14Nov 22, 2023Updated 2 years ago
- Calculate allowed interactions in QED☆10Nov 2, 2022Updated 3 years ago
- Concept Learning Dynamics☆16Oct 29, 2024Updated last year
- Web-app meant for qp.metakgp.org☆21Dec 8, 2022Updated 3 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆306Jul 15, 2024Updated last year
- ☆14May 15, 2024Updated 2 years ago
- Higher Order SVD implementation in PyTorch☆13Nov 14, 2022Updated 3 years ago
- ☆30Dec 2, 2024Updated last year
- A simple Dataset generator for Moving Mnist☆14May 26, 2023Updated 3 years ago
- Experimental GPU language with meta-programming☆31Sep 6, 2024Updated last year
- [RA-L25/ICRA26] HybridTrack: A Hybrid Approach for Robust Multi-Object Tracking☆41Dec 17, 2025Updated 5 months ago