The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆18Mar 7, 2025Updated last year
Alternatives and similar repositories for HybridNorm
Users that are interested in HybridNorm are comparing it to the libraries listed below
Sorting:
- Training InstructPi2Pix with SDXL.☆19Sep 8, 2023Updated 2 years ago
- code for Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning☆20Jul 16, 2024Updated last year
- Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)☆25Oct 23, 2024Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 7 months ago
- ☆33Oct 4, 2024Updated last year
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆28May 3, 2025Updated 10 months ago
- CogKTR: A Knowledge-Enhanced Text Representation Toolkit for Natural Language Understanding. EMNLP 2022☆31Oct 14, 2022Updated 3 years ago
- ☆27Jul 25, 2023Updated 2 years ago
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 7 months ago
- A summarizer for Japanese articles (but ChatGPT is better)☆10Aug 1, 2022Updated 3 years ago
- Martingale posterior neural networks for fast sequential decision making @ Neurips 2025☆23Nov 13, 2025Updated 3 months ago
- ☆10Nov 17, 2022Updated 3 years ago
- [KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations☆121Feb 6, 2026Updated last month
- ☆14Mar 20, 2025Updated 11 months ago
- ☆16Feb 22, 2025Updated last year
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- This is code for the EMNLP 2022 Paper "UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation".☆10Apr 30, 2023Updated 2 years ago
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆41Sep 29, 2024Updated last year
- ☆13Nov 18, 2023Updated 2 years ago
- Conditional Linear Dynamical Systems☆15Oct 7, 2025Updated 5 months ago
- 苏州大学研究生学位论文模板 - Soochow University Thesis TeX Template☆17Feb 27, 2026Updated last week
- [ICML 2024] Code for the paper "MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts"☆10Jul 1, 2024Updated last year
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- ☆16Jun 30, 2025Updated 8 months ago
- JPush's officially supported PhoneGap/Cordova plugin (Android & iOS). 极光推送官方支持的 PhoneGap/Cordova ionic2/3 Native插件(Android & iOS)。 http:/…☆10Jul 9, 2017Updated 8 years ago
- Two-stage text summarization with BERT and BART☆11Jan 5, 2022Updated 4 years ago
- JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning☆10Nov 3, 2024Updated last year
- This repository is dedicated to Track 2 of the W-CODA 2024 Workshop, "Multimodal Perception and Comprehension of Corner Cases in Autonomo…☆17Jun 12, 2024Updated last year
- Tiny evaluation of leading LLMs on competitive programming problems☆14Nov 28, 2024Updated last year
- Code for running forward and backward versions of GPT2☆10Nov 20, 2021Updated 4 years ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- A Zen approach to configuring your Python project☆15Feb 27, 2026Updated last week
- ☆11Nov 23, 2020Updated 5 years ago
- Our paper is titled "NUS-IDS at FinCausal 2021: Dependency Tree in Graph Neural Networks for better Cause-Effect Span Detection".☆13Feb 11, 2022Updated 4 years ago
- A PyTorch implementation of the paper "MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis".☆12Jan 16, 2023Updated 3 years ago
- Code for COLING 2022 paper "FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition"☆15Jan 15, 2023Updated 3 years ago
- 3D Scene Flow Estimation☆14Sep 24, 2025Updated 5 months ago
- Code for "AtTGen: Attribute Tree Generation for Real-World Attribute Joint Extraction", ACL 2023☆13May 19, 2023Updated 2 years ago
- Code for Semi-crowdsourced Clustering with Deep Generative Models☆12Dec 9, 2022Updated 3 years ago