facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
☆445Updated last month
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input☆762Updated 3 weeks ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆817Updated 11 months ago
- Live-bending a foundation model’s output at neural network level.☆261Updated 2 months ago
- OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆272Updated last month
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆303Updated last month
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,157Updated 2 weeks ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆312Updated 8 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,735Updated last month
- Build your own visual reasoning model☆390Updated 2 weeks ago
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆712Updated this week
- Kyutai with an "eye"☆201Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆182Updated 5 months ago
- Dream 7B, a large diffusion language model☆791Updated 2 weeks ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆340Updated 6 months ago
- Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs☆810Updated 2 months ago
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perfor…☆323Updated last year
- See Through Your Models☆395Updated 3 months ago
- Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,195Updated last week
- ☆159Updated last month
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆228Updated 9 months ago
- Liquid: Language Models are Scalable and Unified Multi-modal Generators☆594Updated 2 months ago
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆771Updated 2 weeks ago
- ☆211Updated 3 months ago
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.☆389Updated 2 weeks ago
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆99Updated 3 months ago
- [Fully open] [Encoder-free MLLM] Vision as LoRA☆310Updated 3 weeks ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆530Updated last week
- Codebase for Aria - an Open Multimodal Native MoE☆1,056Updated 5 months ago
- Frontier Multimodal Foundation Models for Image and Video Understanding☆869Updated last month
- Official PyTorch implementation of TokenSet.☆121Updated 3 months ago