BryceZhuo / HybridNormView external linksLinks
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆18Mar 7, 2025Updated 11 months ago
Alternatives and similar repositories for HybridNorm
Users that are interested in HybridNorm are comparing it to the libraries listed below
Sorting:
- Training InstructPi2Pix with SDXL.☆19Sep 8, 2023Updated 2 years ago
- code for Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning☆20Jul 16, 2024Updated last year
- Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)☆25Oct 23, 2024Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 6 months ago
- ☆33Oct 4, 2024Updated last year
- [KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations☆83Feb 6, 2026Updated last week
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆28May 3, 2025Updated 9 months ago
- CogKTR: A Knowledge-Enhanced Text Representation Toolkit for Natural Language Understanding. EMNLP 2022☆31Oct 14, 2022Updated 3 years ago
- ☆27Jul 25, 2023Updated 2 years ago
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 6 months ago
- A summarizer for Japanese articles (but ChatGPT is better)☆10Aug 1, 2022Updated 3 years ago
- Martingale posterior neural networks for fast sequential decision making @ Neurips 2025