[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yin, Shiwei Liu
☆29Jul 24, 2025Updated 9 months ago
Alternatives and similar repositories for MixLN
Users that are interested in MixLN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Pytorch Implementation of "Outlier-weighed Layerwise Sampling for LLM Fine-tuning" by Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei …☆35Jun 3, 2025Updated 11 months ago
- The official repo for the DanQing dataset.☆35Mar 25, 2026Updated last month
- ☆36Mar 12, 2025Updated last year
- An official repository for GPTailor☆17Jun 29, 2025Updated 10 months ago
- ☆12Jun 13, 2025Updated 10 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆14Mar 17, 2025Updated last year
- [ACL'25 Findings] Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"☆41Apr 7, 2025Updated last year
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆19Mar 7, 2025Updated last year
- ☆19Jan 10, 2025Updated last year
- ☆27Mar 29, 2025Updated last year
- [ACL Findings 2026] Official Implementation of "FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acc…☆31Apr 14, 2026Updated 2 weeks ago
- ☆30Jul 22, 2024Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- ☆15Apr 6, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆22May 28, 2024Updated last year
- Fork of Flame repo for training of some new stuff in development☆19Apr 24, 2026Updated last week
- The official implementation of the paper "Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping (TMLR)".☆190Apr 23, 2026Updated last week
- This is an implementation of the paper "Are We Done with Object-Centric Learning?"☆12Apr 12, 2026Updated 3 weeks ago
- ☆14Jan 22, 2025Updated last year
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆15Jun 6, 2025Updated 10 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆33Mar 26, 2025Updated last year
- ☆41Nov 22, 2025Updated 5 months ago
- ☆56Nov 26, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆28May 13, 2025Updated 11 months ago
- The official implementation of the ECCV'24 paper MC-CoT: Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models w…☆26May 19, 2024Updated last year
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆17Nov 24, 2024Updated last year
- ☆16Sep 22, 2024Updated last year
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated last year
- ☆21Oct 2, 2024Updated last year
- ☆19Oct 14, 2024Updated last year
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆19Jul 1, 2025Updated 10 months ago
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆41Feb 4, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Github Repo for OATS: Outlier-Aware Pruning through Sparse and Low Rank Decomposition☆21Apr 16, 2025Updated last year
- [ECMLPKDD 2020] "Topological Insights into Sparse Neural Networks"☆13May 2, 2022Updated 4 years ago
- Official code release for "SuperBPE: Space Travel for Language Models"☆91Jan 9, 2026Updated 3 months ago
- AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference☆20Jan 24, 2025Updated last year
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"☆29Jun 3, 2025Updated 11 months ago
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆24Mar 16, 2025Updated last year
- [ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆76Jun 25, 2025Updated 10 months ago