Official Implementation for NorMuon paper
☆71Apr 30, 2026Updated 3 weeks ago
Alternatives and similar repositories for NorMuon
Users that are interested in NorMuon are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆20Feb 2, 2026Updated 3 months ago
- Switch EMA: A Free Lunch for Better Flatness and Sharpness☆28Feb 16, 2024Updated 2 years ago
- Official repository Flash Local Linear Attention☆23Apr 23, 2026Updated last month
- ☆29Mar 10, 2026Updated 2 months ago
- Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)☆19Jul 28, 2021Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Training tiny models to prove hard theorems☆77Mar 5, 2026Updated 2 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆252Jun 15, 2025Updated 11 months ago
- Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tu…☆93Mar 6, 2026Updated 2 months ago
- Simple and Ideal Circuit Simulation☆13Dec 4, 2017Updated 8 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Oct 17, 2022Updated 3 years ago
- No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)☆29Feb 9, 2022Updated 4 years ago
- Web上に公開されている小説をスクレイピングして青空文庫形式のテキストにする☆19Feb 9, 2017Updated 9 years ago
- manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices☆10Jan 12, 2021Updated 5 years ago
- LEMMA: Logical Engine for Multi-domain Mathematical Analysis☆28Feb 14, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Implementation of Strassen attention, from Kozachinskiy et al. of National Center of AI in Chile☆41Jul 8, 2025Updated 10 months ago
- ☆10Aug 18, 2016Updated 9 years ago
- Code for Paper: Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data☆36Nov 16, 2020Updated 5 years ago
- Recursive Self-Aggregation evals on ARC-AGI☆36Jan 26, 2026Updated 4 months ago
- ☆13Oct 8, 2021Updated 4 years ago
- A PyTorch implementation of a conditional Denoising Diffusion Probabilistic Model (DDPM) for multi-modal trajectory prediction. This proj…☆39Feb 20, 2026Updated 3 months ago
- ☆19Aug 23, 2025Updated 9 months ago
- ☆13Jan 14, 2026Updated 4 months ago
- FDFO: Finite Difference Flow Optimization☆108Apr 27, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Oct 11, 2024Updated last year
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆40Aug 28, 2023Updated 2 years ago
- ☆63Apr 8, 2026Updated last month
- A few models converted from caffe to CoreMLs format.☆15Jun 6, 2017Updated 8 years ago
- Tiny Llama model trained to play chess☆30Jul 22, 2025Updated 10 months ago
- Python package for programmatic animation in human style sketching☆42May 17, 2026Updated last week
- ☆40Updated this week
- API server for VibeVoice☆29Sep 28, 2025Updated 7 months ago
- Combining SOAP and MUON☆22Feb 11, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- ☆40Feb 14, 2026Updated 3 months ago
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆33Mar 26, 2026Updated 2 months ago
- libtpms / swtpm software emulation of a Trusted Platform Module (TPM 1.2 and TPM 2.0) compile script☆13Sep 16, 2020Updated 5 years ago
- Train and run transformers directly on Apple's Neural Engine in Swift bypass coreml entirely☆108Apr 18, 2026Updated last month
- The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"☆133May 15, 2026Updated last week
- ☆10Apr 23, 2021Updated 5 years ago