sramshetty/mixture-of-depths

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sramshetty/mixture-of-depths)

sramshetty / mixture-of-depths

An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

☆35

Alternatives and similar repositories for mixture-of-depths

Users that are interested in mixture-of-depths are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kyegomez / Mixture-of-Depths
View on GitHub
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆123Updated this week
astramind-ai / Mixture-of-depths
View on GitHub
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆175Jun 20, 2024Updated 2 years ago
ekinakyurek / gpt3-arithmetic
View on GitHub
Scratchpad/Chain-of-Thought Prompts
☆12Jun 6, 2022Updated 4 years ago
thepowerfuldeez / OLMo
View on GitHub
My fork os allen AI's OLMo for educational purposes.
☆28Dec 5, 2024Updated last year
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
alxndrTL / IntroRL
View on GitHub
Repo du cours d'introduction à l'apprentissage par renforcement.
☆19Feb 2, 2025Updated last year
sramshetty / ShortGPT
View on GitHub
Unofficial implementations of block/layer-wise pruning methods for LLMs.
☆78Apr 29, 2024Updated 2 years ago
sanagno / adaptively_sparse_attention
View on GitHub
☆24Jul 7, 2023Updated 3 years ago
NoakLiu / MT2ST
View on GitHub
Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]
☆12May 23, 2025Updated last year
joey00072 / ohara
View on GitHub
Collection of autoregressive model implementation
☆84Jun 10, 2026Updated last month
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
MikaStars39 / StableMask
View on GitHub
PyTorch implementation of StableMask (ICML'24)
☆15Jun 27, 2024Updated 2 years ago
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
samchaineau / llm_slerp_generation
View on GitHub
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆37Oct 9, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kyegomez / VisionLLaMA
View on GitHub
Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta
☆15Nov 11, 2024Updated last year
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
KuangjuX / TileGraph
View on GitHub
TileGraph is an experimental DNN compiler that utilizes static code generation and kernel fusion techniques.
☆11Sep 18, 2024Updated last year
bentherien / mu_learned_optimization
View on GitHub
[Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆16Apr 15, 2026Updated 3 months ago
microsoft / rho
View on GitHub
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆471Apr 18, 2024Updated 2 years ago
seba-1511 / igt.pth
View on GitHub
Official PyTorch code release for Implicit Gradient Transport, NeurIPS'19
☆21Jun 11, 2019Updated 7 years ago
Mixture-AI / Mixture-of-Depths
View on GitHub
Google DeepMind: Mixture of Depths Unofficial Implementation.
☆12May 29, 2024Updated 2 years ago
stephenkyang / mean-reversion-pairs-trading
View on GitHub
manipulating cointegrated pairs to achieve a market-neutral strategy that outperforms indices
☆11Jan 12, 2021Updated 5 years ago
Yaxin9Luo / Gamma-MOD
View on GitHub
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆45Oct 28, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
convergence-ai / lm2
View on GitHub
Official repo of paper LM2
☆49Feb 13, 2025Updated last year
reissbaker / clevergpt
View on GitHub
Training GPTs to solve interaction nets
☆18Aug 14, 2024Updated last year
TianjinYellow / SPAM-Optimizer
View on GitHub
☆36Mar 12, 2025Updated last year
EQ-bench / Judgemark-v2
View on GitHub
☆29Nov 13, 2025Updated 8 months ago
RLHFlow / GVM
View on GitHub
☆16Jul 29, 2025Updated last year
VDIGPKU / IterNet
View on GitHub
☆14Nov 2, 2022Updated 3 years ago
tim-roderick / VST
View on GitHub
Video Summarization Transformer: Implementation in PyTorch of the Transformer model for video summarisation
☆10Oct 27, 2020Updated 5 years ago
NoakLiu / GraphSnapShot
View on GitHub
GraphSnapShot: Caching Local Structure for Fast Graph Learning [Efficient ML System]
☆40Apr 5, 2026Updated 3 months ago
NoakLiu / Awesome-Efficient-Foundation-Models-Design
View on GitHub
Efficient Foundation Model Design: A Perspective From Model and System Co-Design [Efficient ML System & Model]
☆31Feb 23, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mchiquier / llm-mutate
View on GitHub
☆15Oct 7, 2024Updated last year
Yondijr / NER_Transformer
View on GitHub
A transformer model that should be able to solve a simple NER task
☆11Mar 7, 2019Updated 7 years ago
OpenNLPLab / LASP
View on GitHub
Linear Attention Sequence Parallelism (LASP)
☆87Jun 4, 2024Updated 2 years ago
zoujuny / TableCell
View on GitHub
在TableBank的基础上，进一步标注到单元格精度，利用目标检测/分割实现单元格定位。
☆14Dec 11, 2019Updated 6 years ago
PhoebusSi / Thinking-while-Observing
View on GitHub
Code for our ACL-2023 paper: "Combo of Thinking and Observing for Outside-Knowledge VQA"
☆12Jun 30, 2023Updated 3 years ago
Josh-XT / SafeExecute
View on GitHub
Safe Python Code Execution Environment for Language Models
☆17Jul 10, 2026Updated 2 weeks ago
kyegomez / MultiQueryAttention
View on GitHub
This is a simple torch implementation of the high performance Multi-Query Attention
☆16Aug 23, 2023Updated 2 years ago