parameterlab / dr-llmLinks
Source code of "Dr.LLM: Dynamic Layer Routing in LLMs"
☆32Updated 2 weeks ago
Alternatives and similar repositories for dr-llm
Users that are interested in dr-llm are comparing it to the libraries listed below
Sorting:
- ☆26Updated last year
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated 4 months ago
- ☆65Updated last year
- This repo contains code for the paper: "Can Foundation Models Help Us Achieve Perfect Secrecy?"☆24Updated 2 years ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆28Updated last year
- ☆26Updated last year
- Code for T-MARS data filtering☆35Updated 2 years ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated 2 years ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated 2 years ago
- ☆10Updated last year
- [COLM 2024] Early Weight Averaging meets High Learning Rates for LLM Pre-training☆17Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆55Updated last year
- ☆76Updated last year
- Code for "Merging Text Transformers from Different Initializations"☆19Updated 8 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Updated 2 years ago
- ☆14Updated 4 years ago
- MEXMA: Token-level objectives improve sentence representations☆42Updated 9 months ago
- Exploration of automated dataset selection approaches at large scales.☆48Updated 7 months ago
- [NeurIPS 2025] MergeBench: A Benchmark for Merging Domain-Specialized LLMs☆31Updated last month
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆56Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆33Updated 3 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated 2 years ago
- ☆37Updated 2 years ago
- Utilities for Training Very Large Models☆58Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆55Updated 2 years ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆55Updated 8 months ago
- Data for "Datamodels: Predicting Predictions with Training Data"☆97Updated 2 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆49Updated 4 years ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆31Updated 2 years ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated 2 years ago