The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for Enabling Dynamic Depth in Transformers. (EMNLP 2025)"
☆31May 12, 2026Updated last month
Alternatives and similar repositories for Router-Tuning-Mixture-of-Depths
Users that are interested in Router-Tuning-Mixture-of-Depths are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".☆14Feb 28, 2026Updated 3 months ago
- Source code of EMNLP 2022 Findings paper "SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters"☆23Feb 28, 2026Updated 3 months ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆90Feb 28, 2026Updated 3 months ago
- The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".☆17Jul 2, 2024Updated last year
- ☆14Aug 18, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Pytorch Code for FedHyper☆11Aug 28, 2024Updated last year
- The official implementation of the paper "Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping (TMLR)".☆191Apr 23, 2026Updated last month
- Sparse Backpropagation for Mixture-of-Expert Training☆30Jul 2, 2024Updated last year
- Code release for AdapMoE accepted by ICCAD 2024☆38Apr 28, 2025Updated last year
- Implementation for EACL 2024 paper "Corpus-Steered Query Expansion with Large Language Models"☆13Mar 19, 2024Updated 2 years ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- [ACL 2026] Paper list of Video LLM hallucination. Welcome to Star and Contribute!☆35Updated this week
- [NeurIPS 2023] Latent Graph Inference with Limited Supervision☆33Feb 1, 2024Updated 2 years ago
- [ICML 2024] Code for the paper "MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts"☆10Jul 1, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- awesome video representation learning☆15Mar 22, 2021Updated 5 years ago
- [CVPR24] OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising☆16Apr 4, 2024Updated 2 years ago
- ☆21Jun 4, 2024Updated 2 years ago
- A multi-lingual benchmark for evaluating industrial domain knowledge of LLMs.☆153Updated this week
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆32Apr 9, 2025Updated last year
- ☆12Sep 23, 2024Updated last year
- Website for HKU NLP group (under construction)☆14Mar 20, 2026Updated 2 months ago
- ☆13Jul 14, 2024Updated last year
- 一个面向中国学生(尤其受10043政策影响)的香港、澳门、新加坡等地区导师信息库。An open-source database of professors in HK/MO/SG/etc. for Chinese students (esp. those affected…☆55Nov 26, 2025Updated 6 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆64Nov 25, 2025Updated 6 months ago
- 深度学习相关综述文章☆13Mar 2, 2019Updated 7 years ago
- ☆19Nov 30, 2025Updated 6 months ago
- ☆14Apr 15, 2025Updated last year
- ☆40Apr 7, 2026Updated 2 months ago
- Code repo for efficient quantized MoE inference with mixture of low-rank compensators☆36Apr 14, 2025Updated last year
- ☆10Oct 8, 2021Updated 4 years ago
- ☆15Jun 21, 2024Updated last year
- 模型加速/模型压缩(已完成所有Lab)☆11Dec 24, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This is the code repo for the paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards".☆24Oct 28, 2024Updated last year
- Extending BookSim2.0 and HotSpot6.0 for Power, Performance and Thermal evaluation of 3D NoC Architectures☆14Aug 9, 2019Updated 6 years ago
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)☆66Jul 24, 2025Updated 10 months ago
- Step into the unknown darkness, converse with the hidden secrets. Behind every baffling scenario, uncover the shocking truth. Let's explo…☆16Oct 7, 2023Updated 2 years ago
- Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.☆20Aug 4, 2021Updated 4 years ago
- ☆14Sep 8, 2019Updated 6 years ago
- Extract KITTI imu and gnss data from raw data for ORB_SLAM3 evaluation. The imu data and gnss data are stored in EuRoC format.☆22Sep 14, 2022Updated 3 years ago