kyegomez/Mixture-of-Depths

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kyegomez/Mixture-of-Depths)

kyegomez / Mixture-of-Depths

Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

☆123

Alternatives and similar repositories for Mixture-of-Depths

Users that are interested in Mixture-of-Depths are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sramshetty / mixture-of-depths
View on GitHub
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆35Jun 7, 2024Updated 2 years ago
astramind-ai / Mixture-of-depths
View on GitHub
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆175Jun 20, 2024Updated 2 years ago
Mixture-AI / Mixture-of-Depths
View on GitHub
Google DeepMind: Mixture of Depths Unofficial Implementation.
☆12May 29, 2024Updated 2 years ago
IBM / selective-dense-state-space-model
View on GitHub
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆16Sep 18, 2025Updated 10 months ago
The-Swarm-Corporation / Research-Paper-Writer-Swarm
View on GitHub
Automate the creation of high quality research papers in latex. Powered by Swarms 🤖
☆11Dec 1, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
The-Swarm-Corporation / Brainwave
View on GitHub
Brainwave is a state-of-the-art neural decoder that transforms electroencephalogram (EEG) and brain signals into multimodal outputs inclu…
☆14Oct 6, 2025Updated 9 months ago
kyegomez / HSSS
View on GitHub
Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Mo…
☆16Nov 11, 2024Updated last year
huyphan168 / PEER
View on GitHub
Mixture of A Million Experts
☆56Jul 30, 2024Updated last year
kyegomez / MHMoE
View on GitHub
Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch
☆31Updated this week
The-Swarm-Corporation / AgentParse
View on GitHub
AgentParse is a high-performance parsing library designed to map various structured data formats (such as Pydantic models, JSON, YAML, an…
☆18Oct 13, 2025Updated 9 months ago
kyegomez / SoundStream
View on GitHub
Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"
☆13Jan 27, 2025Updated last year
yxli2123 / LoSparse
View on GitHub
☆64Oct 17, 2023Updated 2 years ago
kyegomez / OmniByteFormer
View on GitHub
OmniByteFormer is a generalized Transformer model that can process any type of data by converting it into byte sequences, bypassing tradi…
☆16Jul 20, 2026Updated last week
kyegomez / OpenStrawberry
View on GitHub
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆31Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GAIR-NLP / AIME-Preview
View on GitHub
☆84Mar 11, 2025Updated last year
kyegomez / MoE-Mamba
View on GitHub
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆133Updated this week
Shaikhershad / Bulk-Image-Downloader-Free
View on GitHub
bulk image downloader freeware, reddit bulk image downloader, bulk image downloader extension, bulk image downloader from url, bulk image…
☆26Feb 19, 2026Updated 5 months ago
i4Ds / whisper-finetune
View on GitHub
This repository contains code for fine-tuning the Whisper speech-to-text model.
☆24Jul 9, 2026Updated 2 weeks ago
T-Lab-CUHKSZ / G2RPO-A
View on GitHub
[ACL 2026] G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
☆16May 20, 2026Updated 2 months ago
The-Swarm-Corporation / OmniParse
View on GitHub
Transform unstructured documents into actionable, structured data with enterprise-grade precision and reliability, ready for large-scale …
☆20Oct 13, 2025Updated 9 months ago
recursal / minmodmon
View on GitHub
Mini Model Daemon
☆13Nov 9, 2024Updated last year
TheStarOfMSY / MoEAD
View on GitHub
MoEAD is a parameter efficient model for multi class anomaly detection
☆36Dec 20, 2024Updated last year
The-Swarm-Corporation / agentverse
View on GitHub
Various agents from all of the top agent frameworks to integrate into swarms! Langchain, Griptape, CrewAI, and more!
☆17Dec 22, 2025Updated 7 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
UCDvision / NOLA
View on GitHub
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆59Aug 25, 2024Updated last year
kyegomez / Pegasus
View on GitHub
PegasusX: The Future of Multimodal Embeddings 🦄 🦄
☆14Oct 16, 2024Updated last year
zirui-ray-liu / DivAug
View on GitHub
☆13Aug 25, 2021Updated 4 years ago
lucidrains / infini-transformer-pytorch
View on GitHub
Implementation of Infini-Transformer in Pytorch
☆112Jan 4, 2025Updated last year
The-Swarm-Corporation / AgentGym
View on GitHub
A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1
☆24Oct 13, 2025Updated 9 months ago
kyegomez / Falcon
View on GitHub
A simple package for leveraging Falcon 180B and the HF ecosystem's tools, including training/inference scripts, safetensors, integrations…
☆12Mar 11, 2024Updated 2 years ago
elicit / fave-dataset
View on GitHub
Paper dataset for "Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers"
☆14Oct 20, 2024Updated last year
IBM / learn-vector-symbolic-architectures-rule-formulations
View on GitHub
PyTorch Implementation of the paper "Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architec…
☆10Sep 18, 2025Updated 10 months ago
zyxxmu / cam
View on GitHub
Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference
☆50Jun 19, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zwhong714 / adaptive_decoding
View on GitHub
[ICML2024]Adaptive decoding balances the diversity and coherence of open-ended text generation.
☆19Jun 2, 2024Updated 2 years ago
kyegomez / Prometheus
View on GitHub
Welcome to Prometheus, the revolutionary AI model that allows you to generate DNA sequences for any creature you can imagine. Whether it’…
☆15Updated this week
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆281Oct 3, 2025Updated 9 months ago
allenai / OLMoE
View on GitHub
OLMoE: Open Mixture-of-Experts Language Models
☆1,046Sep 23, 2025Updated 10 months ago
tim-roderick / VST
View on GitHub
Video Summarization Transformer: Implementation in PyTorch of the Transformer model for video summarisation
☆10Oct 27, 2020Updated 5 years ago
kyegomez / qformer
View on GitHub
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
☆51Nov 11, 2024Updated last year
arjunguha / BigCodeBench-X
View on GitHub
A benchmark of programming tasks for LLMs that supports almost any programming language.
☆13Jun 30, 2025Updated last year