facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
☆450Updated 4 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆859Updated 3 months ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆825Updated last year
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆322Updated 10 months ago
- ☆226Updated 6 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆366Updated last week
- Build your own visual reasoning model☆408Updated 2 weeks ago
- Live-bending a foundation model’s output at neural network level.☆265Updated 5 months ago
- Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.☆1,062Updated this week
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆305Updated 3 months ago
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆601Updated this week
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,341Updated 3 weeks ago
- GRadient-INformed MoE☆264Updated 11 months ago
- ☆175Updated last month
- [ICML 2025] Official PyTorch implementation of LongVU☆396Updated 4 months ago
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆240Updated 2 weeks ago
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perfor…☆326Updated last year
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆112Updated last month
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆541Updated 2 months ago
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆807Updated 2 months ago
- Codebase for Aria - an Open Multimodal Native MoE☆1,067Updated 7 months ago
- An open source implementation of CLIP (With TULIP Support)☆162Updated 3 months ago
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗☆274Updated 6 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,802Updated 3 months ago
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.☆408Updated 2 weeks ago
- See Through Your Models☆400Updated 2 months ago
- LLaVA-Interactive-Demo☆379Updated last year
- Dream 7B, a large diffusion language model☆959Updated 3 weeks ago
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆798Updated 2 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆826Updated last week
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆337Updated 3 months ago