facebookresearch / MILSLinks
Code release for "LLMs can see and hear without any training"
β454Updated 6 months ago
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- π MINT-1T: A one trillion token multimodal interleaved dataset.β827Updated last year
- Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the inputβ924Updated 5 months ago
- β48Updated last week
- Build your own visual reasoning modelβ415Updated 2 weeks ago
- β242Updated 8 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.β329Updated last year
- [ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learningβ408Updated this week
- Live-bending a foundation modelβs output at neural network level.β271Updated 7 months ago
- Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"β269Updated 2 weeks ago
- [ACL 2025 π₯] Rethinking Step-by-step Visual Reasoning in LLMsβ310Updated 6 months ago
- [NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Modelsβ1,504Updated 2 weeks ago
- codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)β687Updated last month
- β182Updated last week
- A character-level language diffusion model trained on Tiny Shakespeareβ587Updated 2 weeks ago
- Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5Bβ527Updated 2 weeks ago
- β1,215Updated 2 weeks ago
- GRadient-INformed MoEβ264Updated last year
- Official implementation of "Continuous Autoregressive Language Models"β646Updated this week
- F Lite is a 10B parameter diffusion model created by Freepik and Fal, trained exclusively on copyright-safe and SFW content.β417Updated 3 months ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, imβ¦β3,018Updated last month
- Dream 7B, a large diffusion language modelβ1,094Updated last week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agentsβ1,867Updated 2 months ago
- See Through Your Modelsβ402Updated 4 months ago
- [ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Modelsβ899Updated 4 months ago
- A minimal implementation of DeepMind's Genie world modelβ1,042Updated last week
- Kyutai with an "eye"β225Updated 8 months ago
- StreamingVLM: Real-Time Understanding for Infinite Video Streamsβ731Updated last month
- Official PyTorch implementation of TokenSet.β127Updated 8 months ago
- Pivotal Token Searchβ131Updated 4 months ago
- [ICML 2025] Official PyTorch implementation of LongVUβ412Updated 6 months ago