facebookresearch / Mixture-of-TransformersLinks
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.
☆91Updated 3 months ago
Alternatives and similar repositories for Mixture-of-Transformers
Users that are interested in Mixture-of-Transformers are comparing it to the libraries listed below
Sorting:
- LL3M: Large Language and Multi-Modal Model in Jax☆73Updated last year
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction☆65Updated 3 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆103Updated 2 weeks ago
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆58Updated last week
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆56Updated 5 months ago
- Esoteric Language Models☆94Updated 3 weeks ago
- ☆85Updated last year
- Implementation of a multimodal diffusion transformer in Pytorch☆103Updated last year
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆30Updated 3 months ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆131Updated 2 weeks ago
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆43Updated 9 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆51Updated 5 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆37Updated 5 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆124Updated last week
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆176Updated last year
- ☆34Updated 3 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆33Updated 3 weeks ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆148Updated last month
- Easily run PyTorch on multiple GPUs & machines☆46Updated 2 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆179Updated 2 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆77Updated 8 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆114Updated 11 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆210Updated last week
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆82Updated 8 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆143Updated 3 months ago
- ☆101Updated 11 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆81Updated 9 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆199Updated 5 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated last year
- Resa: Transparent Reasoning Models via SAEs☆41Updated 2 weeks ago