facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
☆452Updated 5 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆900Updated 4 months ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆826Updated last year
- Build your own visual reasoning model☆413Updated 2 weeks ago
- ☆234Updated 7 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆325Updated last year
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆307Updated 5 months ago
- ☆177Updated 2 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆400Updated last month
- NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training☆289Updated 4 months ago
- Live-bending a foundation model’s output at neural network level.☆267Updated 6 months ago
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆650Updated 3 weeks ago
- GRadient-INformed MoE☆264Updated last year
- ☆135Updated last month
- A minimal implementation of DeepMind's Genie world model☆994Updated 3 weeks ago
- Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.☆1,279Updated last week
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆810Updated 4 months ago
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗☆277Updated 8 months ago
- [NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,445Updated last week
- [ICML 2025] Official PyTorch implementation of LongVU☆403Updated 5 months ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆2,699Updated 2 weeks ago
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆121Updated 2 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆836Updated last week
- Official PyTorch implementation of TokenSet.☆125Updated 7 months ago
- Official GitHub repository for FLUX.1 Krea [dev].☆348Updated 2 months ago
- Codebase for Aria - an Open Multimodal Native MoE☆1,071Updated 9 months ago
- LLaVA-Interactive-Demo☆378Updated last year
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,827Updated 3 weeks ago
- Dream 7B, a large diffusion language model☆1,018Updated 3 weeks ago
- Pivotal Token Search☆128Updated 3 months ago
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perfor…☆328Updated last year