facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
☆451Updated 4 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆873Updated 3 months ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆825Updated last year
- Build your own visual reasoning model☆409Updated last month
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆321Updated 11 months ago
- ☆231Updated 6 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆389Updated 2 weeks ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆306Updated 4 months ago
- Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.☆1,207Updated last week
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆120Updated last month
- NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training☆287Updated 4 months ago
- Live-bending a foundation model’s output at neural network level.☆265Updated 5 months ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆2,339Updated last week
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perfor…☆326Updated last year
- [NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,383Updated 2 weeks ago
- Official GitHub repository for FLUX.1 Krea [dev].☆340Updated 2 months ago
- [ICML 2025] Official PyTorch implementation of LongVU☆398Updated 4 months ago
- LLaVA-Interactive-Demo☆379Updated last year
- Official PyTorch implementation of TokenSet.☆123Updated 6 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,811Updated 4 months ago
- Code for the Molmo Vision-Language Model☆761Updated 9 months ago
- ☆175Updated 2 months ago
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.☆411Updated last month
- GRadient-INformed MoE☆264Updated last year
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)☆635Updated 3 weeks ago
- ☆134Updated last month
- Pivotal Token Search☆125Updated 2 months ago
- See Through Your Models☆399Updated 2 months ago
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗☆277Updated 7 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆551Updated 3 months ago
- A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…☆290Updated last month