facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
☆447Updated 2 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆813Updated last month
- MINT-1T: A one trillion token multimodal interleaved dataset.☆819Updated 11 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆285Updated 2 months ago
- Build your own visual reasoning model☆395Updated 2 weeks ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆316Updated 9 months ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆305Updated 2 months ago
- ☆217Updated 4 months ago
- Dream 7B, a large diffusion language model☆848Updated last month
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.☆399Updated last month
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,218Updated last month
- Live-bending a foundation model’s output at neural network level.☆265Updated 3 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,757Updated last month
- NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training☆288Updated last month
- ☆162Updated 2 months ago
- GRadient-INformed MoE☆264Updated 10 months ago
- Official PyTorch implementation of TokenSet.☆121Updated 4 months ago
- Codebase for Aria - an Open Multimodal Native MoE☆1,061Updated 6 months ago
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆105Updated 2 weeks ago
- [ICML 2025] Official PyTorch implementation of LongVU☆391Updated 2 months ago
- LLaVA-Interactive-Demo☆375Updated last year
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆317Updated last month
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆737Updated 2 weeks ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆531Updated 3 weeks ago
- ☆128Updated 10 months ago
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆775Updated last month
- Pivotal Token Search☆111Updated last week
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perfor…☆326Updated last year
- PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"☆649Updated last year
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,129Updated 5 months ago
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)☆249Updated this week