facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
☆449Updated 3 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆838Updated 2 months ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆821Updated last year
- ☆222Updated 5 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆318Updated 9 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆287Updated 3 months ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆305Updated 2 months ago
- Build your own visual reasoning model☆404Updated last week
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆105Updated last week
- Live-bending a foundation model’s output at neural network level.☆268Updated 4 months ago
- GRadient-INformed MoE☆265Updated 10 months ago
- Official GitHub repository for FLUX.1 Krea [dev].☆314Updated 2 weeks ago
- See Through Your Models☆398Updated last month
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆128Updated last week
- ☆169Updated last week
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,294Updated 2 months ago
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.☆402Updated last month
- [ICML 2025] Official PyTorch implementation of LongVU☆394Updated 3 months ago
- Codebase for Aria - an Open Multimodal Native MoE☆1,067Updated 6 months ago
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"☆661Updated last year
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆788Updated 2 months ago
- Pivotal Token Search☆119Updated last month
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆228Updated last week
- Official repository for "DynaSaur: Large Language Agents Beyond Predefined Actions"☆348Updated 7 months ago
- LLaVA-Interactive-Demo☆376Updated last year
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗☆264Updated 5 months ago
- Mistral7B playing DOOM☆133Updated last year
- Applying the ideas of Deepseek R1 to computer use☆217Updated 6 months ago
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆284Updated last month
- Vision Language Models are Biased☆65Updated last month
- Official PyTorch implementation of TokenSet.☆121Updated 4 months ago