cofe-ai / Mu-scalingLinks
Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales
☆32Updated 2 years ago
Alternatives and similar repositories for Mu-scaling
Users that are interested in Mu-scaling are comparing it to the libraries listed below
Sorting:
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- ☆108Updated 4 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆38Updated 11 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆64Updated 2 years ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆181Updated 9 months ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆76Updated 6 months ago
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆68Updated last year
- The code and data for the paper JiuZhang3.0