facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
β457Updated 9 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the inputβ938Updated 8 months ago
- π MINT-1T: A one trillion token multimodal interleaved dataset.β828Updated last year
- Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"β288Updated last week
- [ACL 2025 π₯] Rethinking Step-by-step Visual Reasoning in LLMsβ310Updated 8 months ago
- Build your own visual reasoning modelβ418Updated 3 weeks ago
- β258Updated 11 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.β347Updated last year
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningβ442Updated last week
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Modelβ133Updated 6 months ago
- Live-bending a foundation modelβs output at neural network level.β273Updated 10 months ago
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perforβ¦β328Updated last year
- LLaVA-Interactive-Demoβ380Updated last year
- See Through Your Modelsβ400Updated 7 months ago
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.β427Updated 5 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agentsβ1,895Updated 2 weeks ago
- β182Updated 2 months ago
- A character-level language diffusion model trained on Tiny Shakespeareβ849Updated 3 weeks ago
- Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration π€β297Updated 11 months ago
- [ICML 2025] Official PyTorch implementation of LongVUβ421Updated 9 months ago
- An open source implementation of CLIP (With TULIP Support)β165Updated 8 months ago
- β510Updated last week
- GRadient-INformed MoEβ264Updated last year
- β140Updated last month
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)β348Updated 3 weeks ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.β617Updated last week
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Modelsβ1,569Updated 2 months ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5Bβ569Updated 2 months ago
- We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe modelsβ underβ¦β52Updated 5 months ago
- Official implementation of "Continuous Autoregressive Language Models"β726Updated 2 months ago
- Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.β1,507Updated last month