cofe-ai/Mu-scaling

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cofe-ai/Mu-scaling)

cofe-ai / Mu-scaling

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

☆32

Alternatives and similar repositories for Mu-scaling

Users that are interested in Mu-scaling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shreyansh26 / An-Empirical-Model-of-Large-Batch-Training
View on GitHub
An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST
☆11Nov 19, 2022Updated 3 years ago
yegcjs / mixinglaws
View on GitHub
☆113Jul 15, 2025Updated last year
hal-314 / fastai-batch-size-finder
View on GitHub
Implementation of OpenAI paper with Simple Noise Scale on Fastai V2
☆19Apr 16, 2021Updated 5 years ago
tml-epfl / icl-alignment
View on GitHub
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆33Jan 23, 2025Updated last year
UCSB-NLP-Chang / Prereq_tune
View on GitHub
Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"
☆11Jan 10, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆43Dec 29, 2025Updated 6 months ago
allenai / fluid-benchmarking
View on GitHub
Fluid Language Model Benchmarking
☆29Sep 16, 2025Updated 10 months ago
angie-chen55 / pref-learning-ranking-acc
View on GitHub
☆13Jun 4, 2024Updated 2 years ago
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
QingyangZhang / EMPO
View on GitHub
[NeurIPS25 Spotlight] EMPO, A Fully Unsupervised RLVR Method
☆103Nov 24, 2025Updated 8 months ago
john-hewitt / implicit-ins
View on GitHub
Codebase for Instruction Following without Instruction Tuning
☆36Sep 24, 2024Updated last year
nikhilvyas / SOAP_MUON
View on GitHub
Combining SOAP and MUON
☆25Feb 11, 2025Updated last year
cofe-ai / MSG
View on GitHub
Masked Structural Growth for 2x Faster Language Model Pre-training
☆25Apr 28, 2024Updated 2 years ago
trestad / mitigating-reversal-curse
View on GitHub
Code for paper 'Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse'
☆14Aug 2, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
janphilippfranken / sami
View on GitHub
Self-Supervised Alignment with Mutual Information
☆20May 24, 2024Updated 2 years ago
cunliangkong / linux-envs
View on GitHub
personal settings for linux tools, including zsh, vim, tmux, pip.
☆11Dec 2, 2019Updated 6 years ago
Yuliang-Liu / SPTSv2
View on GitHub
☆22May 30, 2023Updated 3 years ago
yuyq96 / TextHawk
View on GitHub
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆68Nov 1, 2024Updated last year
stephenkyang / mean-reversion-pairs-trading
View on GitHub
manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices
☆11Jan 12, 2021Updated 5 years ago
mozhu621 / LongGenBench
View on GitHub
☆37Oct 4, 2025Updated 9 months ago
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
shawntan / stickbreaking-attention
View on GitHub
Stick-breaking attention
☆63Jul 1, 2025Updated last year
tianyi-lab / Cherry_LLM
View on GitHub
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆417Jun 25, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
thu-ml / TetraJet-MXFP4Training
View on GitHub
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆40May 4, 2026Updated 2 months ago
thu-coai / SPaR
View on GitHub
☆47Jun 11, 2025Updated last year
HypherX / Evolution-Analysis
View on GitHub
☆25Dec 13, 2024Updated last year
Infini-AI-Lab / gsm_infinite
View on GitHub
☆65Jun 12, 2025Updated last year
ant-research / M2-Miner
View on GitHub
[ICLR 2026] M2-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
☆55Apr 22, 2026Updated 3 months ago
MelosY / CAM
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
Jason3900 / corenlp_client
View on GitHub
A python wrapper for Stanford CoreNLP, simple and customizable.
☆13Oct 26, 2021Updated 4 years ago
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
cloneofsimo / min-max-gpt
View on GitHub
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Apr 17, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
liujch1998 / memo-trap
View on GitHub
☆23Jan 25, 2023Updated 3 years ago
recursal / GoldFinch-paper
View on GitHub
GoldFinch and other hybrid transformer components
☆46Jul 20, 2024Updated 2 years ago
Muennighoff / FLAN
View on GitHub
Provides a minimal implementation to extract FLAN datasets for further processing
☆11Feb 1, 2023Updated 3 years ago
richardodliu / OpenCodeEval
View on GitHub
☆52Mar 9, 2026Updated 4 months ago
kuribayashi4 / span_based_argumentation_parser
View on GitHub
☆11Feb 2, 2023Updated 3 years ago
thu-coai / BARREL
View on GitHub
[ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
☆18May 21, 2025Updated last year
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago