facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
β458Updated 8 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- π MINT-1T: A one trillion token multimodal interleaved dataset.β828Updated last year
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the inputβ938Updated 7 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.β347Updated last year
- β256Updated 10 months ago
- Build your own visual reasoning modelβ418Updated 3 weeks ago
- β500Updated 2 months ago
- Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"β287Updated 2 months ago
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningβ442Updated this week
- [ACL 2025 π₯] Rethinking Step-by-step Visual Reasoning in LLMsβ310Updated 8 months ago
- A character-level language diffusion model trained on Tiny Shakespeareβ842Updated 2 weeks ago
- β182Updated 2 months ago
- [ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perforβ¦β329Updated last year
- A minimal implementation of DeepMind's Genie world modelβ1,118Updated 2 months ago
- A reimplementation of Stable Diffusion 3.5 in pure PyTorchβ691Updated 7 months ago
- GRadient-INformed MoEβ264Updated last year
- Live-bending a foundation modelβs output at neural network level.β273Updated 9 months ago
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.β426Updated 5 months ago
- See Through Your Modelsβ400Updated 6 months ago
- Official PyTorch implementation of TokenSet.β127Updated 10 months ago
- β140Updated last month
- [ICML 2025] Official PyTorch implementation of LongVUβ420Updated 8 months ago
- LLaVA-Interactive-Demoβ380Updated last year
- An open source implementation of CLIP (With TULIP Support)β165Updated 8 months ago
- Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.β1,500Updated last month
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)β746Updated last month
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)β349Updated 2 weeks ago
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generationβ823Updated 7 months ago
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Modelβ133Updated 5 months ago
- Official repo for Images that sound: a special spectrogram that can be seen as images and played as sound generated by diffusionsβ248Updated 11 months ago
- β197Updated 8 months ago