Google DeepMind: Mixture of Depths Unofficial Implementation.
☆12May 29, 2024Updated last year
Alternatives and similar repositories for Mixture-of-Depths
Users that are interested in Mixture-of-Depths are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"☆47Jul 4, 2024Updated last year
- [ICLR 2024] Official pytorch implementation of "Denoising Task Routing for Diffusion Models"☆24Feb 19, 2024Updated 2 years ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆115Updated this week
- Solution to the AMOS-MM challenge☆13Sep 13, 2025Updated 5 months ago
- 基于FISCO-BCOS区块链的供应链demo,使用node.js构建后端☆10Jan 28, 2021Updated 5 years ago
- AI Studio by Metric Coders: A No-Code Software to train, download and deploy Large Language Models.☆11Jul 5, 2024Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆177Jun 20, 2024Updated last year
- Decoupled Neural Interfaces Using Synthetic Gradients - under develeopment☆11Jun 27, 2025Updated 8 months ago
- Testing Difference Target Propagation (DTP) on MNIST.☆12Oct 12, 2020Updated 5 years ago
- EnSaaS document☆11Oct 1, 2025Updated 5 months ago
- Explanation of the llama2 repo.☆12Jul 18, 2024Updated last year
- Advanced audio player component (audix) for Streamlit with waveform visualization and region selection☆14Jun 24, 2025Updated 8 months ago
- ☆12Sep 24, 2024Updated last year
- developing tools for LIAF-SNNs and LIF-SNNs☆10Sep 14, 2022Updated 3 years ago
- [ICLR 2025] Adaptive prompt tailored pruning of T2I diffusion models.☆15Feb 1, 2025Updated last year
- "Towards Scaling Difference Target Propagation by Learning Backprop Targets" (ICML 2022)☆12Jan 17, 2023Updated 3 years ago
- [NeurIPS 2024] Advancing Training Efficiency of Deep Spiking Neural Networks through Rate-based Backpropagation☆19Jan 16, 2025Updated last year
- HACSurv: A Hierarchical Copula-based Approach for Survival Analysis with Dependent Competing Risks☆12Mar 5, 2025Updated last year
- Implementation of spiking DQN training using different conversion techniques and backpropagation with surrogate gradients employed on the…☆11Feb 11, 2023Updated 3 years ago
- ☆15Apr 11, 2024Updated last year
- [3DV 2025] CoE: Deep Coupled Embedding for Non-Rigid Point Cloud Correspondences☆18Jan 5, 2026Updated 2 months ago
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆18Jul 1, 2025Updated 8 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- DLL注入工具☆12Nov 9, 2020Updated 5 years ago
- A comprehensive collection of Claude Code skills for document generation, styling, and manipulation. Includes Document Polisher with 10 p…☆47Dec 3, 2025Updated 3 months ago
- ZJU standard C Compiler☆11Dec 18, 2016Updated 9 years ago
- This codes presents examples of constructing primitives for data structures with Hyperdimensional Computing/Vector Symbolic Architectures☆15Jun 4, 2021Updated 4 years ago
- Unreal Engine 5 3D Platformer game prototype☆17May 27, 2024Updated last year
- PilotFish harvests the free GPU cycles of cloud gaming with deep learning training☆14Jul 2, 2022Updated 3 years ago
- ECE385 lab from UIUC☆14Nov 13, 2021Updated 4 years ago
- ☆16May 23, 2024Updated last year
- ☆15Jun 26, 2024Updated last year
- [ECCV 2024] Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning☆16Sep 23, 2024Updated last year
- The Go driver for MongoDB☆15Nov 3, 2021Updated 4 years ago
- ☆45Dec 6, 2025Updated 3 months ago
- ☆17Nov 26, 2024Updated last year
- Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5☆16Sep 19, 2024Updated last year
- Repository for Sparse Universal Transformers☆20Oct 23, 2023Updated 2 years ago
- Code for "Accelerating Transformer Pre-training with 2:4 Sparsity"☆27Dec 8, 2024Updated last year