zichongli5/NorMuon

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zichongli5/NorMuon)

zichongli5 / NorMuon

Official Implementation for NorMuon paper

☆71

Alternatives and similar repositories for NorMuon

Users that are interested in NorMuon are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lliu606 / COSMOS
View on GitHub
☆20Feb 2, 2026Updated 3 months ago
Westlake-AI / SEMA
View on GitHub
Switch EMA: A Free Lunch for Better Flatness and Sharpness
☆28Feb 16, 2024Updated 2 years ago
Yifei-Zuo / Flash-LLA
View on GitHub
Official repository Flash Local Linear Attention
☆23Apr 23, 2026Updated last month
optsuite / OptMATH
View on GitHub
☆29Mar 10, 2026Updated 2 months ago
cliang1453 / super-structured-lottery-tickets
View on GitHub
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)
☆19Jul 28, 2021Updated 4 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
CMU-AIRe / QED-Nano
View on GitHub
Training tiny models to prove hard theorems
☆77Mar 5, 2026Updated 2 months ago
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆252Jun 15, 2025Updated 11 months ago
mechramc / Orion
View on GitHub
Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tu…
☆93Mar 6, 2026Updated 2 months ago
Immortalin / Simulacra
View on GitHub
Simple and Ideal Circuit Simulation
☆13Dec 4, 2017Updated 8 years ago
QingruZhang / PLATON
View on GitHub
This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).
☆46Oct 17, 2022Updated 3 years ago
cliang1453 / SAGE
View on GitHub
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models (ICLR 2022)
☆29Feb 9, 2022Updated 4 years ago
tdtds / aozoragen
View on GitHub
Web上に公開されている小説をスクレイピングして青空文庫形式のテキストにする
☆19Feb 9, 2017Updated 9 years ago
stephenkyang / mean-reversion-pairs-trading
View on GitHub
manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices
☆10Jan 12, 2021Updated 5 years ago
Pushp-Kharat1 / LEMMA
View on GitHub
LEMMA: Logical Engine for Multi-domain Mathematical Analysis
☆28Feb 14, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lucidrains / strassen-attention
View on GitHub
Implementation of Strassen attention, from Kozachinskiy et al. of National Center of AI in Chile
☆41Jul 8, 2025Updated 10 months ago
s11y / Gomenna-SideStep
View on GitHub
☆10Aug 18, 2016Updated 9 years ago
Lingkai-Kong / Calibrated-BERT-Fine-Tuning
View on GitHub
Code for Paper: Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data
☆36Nov 16, 2020Updated 5 years ago
rsa-llm / RSA-ARC
View on GitHub
Recursive Self-Aggregation evals on ARC-AGI
☆36Jan 26, 2026Updated 4 months ago
VisionLearningGroup / SND
View on GitHub
☆13Oct 8, 2021Updated 4 years ago
AntonioAlgaida / DiffusionTrajectoryPlanner
View on GitHub
A PyTorch implementation of a conditional Denoising Diffusion Probabilistic Model (DDPM) for multi-modal trajectory prediction. This proj…
☆39Feb 20, 2026Updated 3 months ago
MaximeRivest / moereport
View on GitHub
☆19Aug 23, 2025Updated 9 months ago
caiqizh / LUQ
View on GitHub
☆13Jan 14, 2026Updated 4 months ago
NVlabs / finite-difference-flow-optimization
View on GitHub
FDFO: Finite Difference Flow Optimization
☆108Apr 27, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
swairshah / Intensify
View on GitHub
coloring terminal text with intensities (used for plotting probability, entropy with tokens)
☆12Oct 11, 2024Updated last year
cliang1453 / task-aware-distillation
View on GitHub
Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
☆40Aug 28, 2023Updated 2 years ago
LIONS-EPFL / scion
View on GitHub
☆63Apr 8, 2026Updated last month
mdering / CoreMLZoo
View on GitHub
A few models converted from caffe to CoreMLs format.
☆15Jun 6, 2017Updated 8 years ago
lazy-guy / chess-llama
View on GitHub
Tiny Llama model trained to play chess
☆30Jul 22, 2025Updated 10 months ago
subroy13 / handanim
View on GitHub
Python package for programmatic animation in human style sketching
☆42May 17, 2026Updated last week
allenai / olmix
View on GitHub
☆40Updated this week
vibevoice-community / VibeVoice-API
View on GitHub
API server for VibeVoice
☆29Sep 28, 2025Updated 7 months ago
nikhilvyas / SOAP_MUON
View on GitHub
Combining SOAP and MUON
☆22Feb 11, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ckkissane / sae-transfer
View on GitHub
Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"
☆13Jul 18, 2024Updated last year
maderix / SimpLang
View on GitHub
☆40Feb 14, 2026Updated 3 months ago
LARK-AI-Lab / CodeScaler
View on GitHub
The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"
☆33Mar 26, 2026Updated 2 months ago
rayures / vTPM
View on GitHub
libtpms / swtpm software emulation of a Trusted Platform Module (TPM 1.2 and TPM 2.0) compile script
☆13Sep 16, 2020Updated 5 years ago
christopherkarani / Espresso
View on GitHub
Train and run transformers directly on Apple's Neural Engine in Swift bypass coreml entirely
☆108Apr 18, 2026Updated last month
zhengkid / AutoTTS
View on GitHub
The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling"
☆133May 15, 2026Updated last week
npvoid / OnlineDoubleOracle
View on GitHub
☆10Apr 23, 2021Updated 5 years ago