facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
β456Updated 8 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the inputβ934Updated 7 months ago
- π MINT-1T: A one trillion token multimodal interleaved dataset.β827Updated last year
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.β346Updated last year
- β252Updated 10 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningβ413Updated last month
- Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"β283Updated last month
- Build your own visual reasoning modelβ416Updated last month
- Live-bending a foundation modelβs output at neural network level.β271Updated 9 months ago
- [ACL 2025 π₯] Rethinking Step-by-step Visual Reasoning in LLMsβ310Updated 7 months ago
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Modelβ128Updated 5 months ago
- β185Updated last month
- A character-level language diffusion model trained on Tiny Shakespeareβ824Updated last week
- GRadient-INformed MoEβ264Updated last year
- Official implementation of "Continuous Autoregressive Language Models"β684Updated last month
- Codebase for Aria - an Open Multimodal Native MoEβ1,085Updated 11 months ago
- An open source implementation of CLIP (With TULIP Support)β165Updated 7 months ago
- Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.β1,476Updated 3 weeks ago
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)β342Updated 3 months ago
- β494Updated last month
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.β421Updated 4 months ago
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perforβ¦β329Updated last year
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agentsβ1,889Updated 3 months ago
- LLaVA-Interactive-Demoβ380Updated last year
- Kyutai with an "eye"β232Updated 9 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β567Updated last month
- [ICML 2025] Official PyTorch implementation of LongVUβ417Updated 8 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5Bβ559Updated last month
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generationβ817Updated 6 months ago
- A reimplementation of Stable Diffusion 3.5 in pure PyTorchβ691Updated 6 months ago
- Official PyTorch implementation of TokenSet.β127Updated 9 months ago