astramind-ai / Mixture-of-depths

Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
134Updated 5 months ago

Related projects

Alternatives and complementary repositories for Mixture-of-depths