facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
β454Updated 7 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the inputβ928Updated 6 months ago
- π MINT-1T: A one trillion token multimodal interleaved dataset.β827Updated last year
- Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"β278Updated last month
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.β343Updated last year
- Build your own visual reasoning modelβ415Updated last month
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningβ410Updated 3 weeks ago
- β248Updated 9 months ago
- [ACL 2025 π₯] Rethinking Step-by-step Visual Reasoning in LLMsβ310Updated 7 months ago
- β474Updated 3 weeks ago
- Official implementation of "Continuous Autoregressive Language Models"β677Updated 3 weeks ago
- Live-bending a foundation modelβs output at neural network level.β272Updated 8 months ago
- A character-level language diffusion model trained on Tiny Shakespeareβ610Updated last month
- Pivotal Token Searchβ135Updated this week
- β139Updated last week
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Modelβ128Updated 4 months ago
- β185Updated 3 weeks ago
- Official PyTorch implementation of TokenSet.β127Updated 9 months ago
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perforβ¦β329Updated last year
- GRadient-INformed MoEβ265Updated last year
- dLLM: Simple Diffusion Language Modelingβ1,504Updated this week
- LLaVA-Interactive-Demoβ380Updated last year
- [ICML 2025] Official PyTorch implementation of LongVUβ412Updated 7 months ago
- Codebase for Aria - an Open Multimodal Native MoEβ1,085Updated 11 months ago
- β1,233Updated last month
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)β333Updated 2 months ago
- β81Updated last year
- A Reproduction of GDM's Nested Learning Paperβ463Updated 3 weeks ago
- An open source implementation of CLIP (With TULIP Support)β164Updated 7 months ago
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.β420Updated 4 months ago
- Code for the Molmo Vision-Language Modelβ839Updated last year