facebookresearch / MILSLinks

Code release for "LLMs can see and hear without any training"

☆452

Alternatives and similar repositories for MILS

Users that are interested in MILS are comparing it to the libraries listed below

Sorting:

SakanaAI / text-to-lora
Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input
☆900Updated 4 months ago
mlfoundations / MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
☆826Updated last year
groundlight / r1_vlm
Build your own visual reasoning model
☆413Updated 2 weeks ago
em-llm / EM-LLM-model
☆234Updated 7 months ago
SakanaAI / evo-memory
Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.
☆325Updated last year
mbzuai-oryx / LlamaV-o1
[ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs
☆307Updated 5 months ago
menloresearch / visual-thinker
☆177Updated 2 months ago
UCSC-VLAA / OpenVision
[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
☆400Updated last month
ChenDarYen / NitroFusion
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
☆289Updated 4 months ago
babycommando / neuralgraffiti
Live-bending a foundation model’s output at neural network level.
☆267Updated 6 months ago
Chengsong-Huang / R-Zero
codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)
☆650Updated 3 weeks ago
microsoft / GRIN-MoE
GRadient-INformed MoE
☆264Updated last year
anguyen8 / vision-llms-are-blind
☆135Updated last month
AlmondGod / tinyworlds
A minimal implementation of DeepMind's Genie world model
☆994Updated 3 weeks ago
Tencent-Hunyuan / HunyuanWorld-Voyager
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
☆1,279Updated last week
GAIR-NLP / anole
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
☆810Updated 4 months ago
merveenoyan / siglip
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
☆277Updated 8 months ago
Gen-Verse / MMaDA
[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
☆1,445Updated last week
Vision-CAIR / LongVU
[ICML 2025] Official PyTorch implementation of LongVU
☆403Updated 5 months ago
QwenLM / Qwen3-Omni
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆2,699Updated 2 weeks ago
Amshaker / Mobile-VideoGPT
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
☆121Updated 2 months ago
seal-rg / recurrent-pretraining
Pretraining and inference code for a large-scale depth-recurrent language model
☆836Updated last week
Gengzigang / TokenSet
Official PyTorch implementation of TokenSet.
☆125Updated 7 months ago
krea-ai / flux-krea
Official GitHub repository for FLUX.1 Krea [dev].
☆348Updated 2 months ago
rhymes-ai / Aria
Codebase for Aria - an Open Multimodal Native MoE
☆1,071Updated 9 months ago
LLaVA-VL / LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
☆378Updated last year
microsoft / Magma
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
☆1,827Updated 3 weeks ago
DreamLM / Dream
Dream 7B, a large diffusion language model
☆1,018Updated 3 weeks ago
codelion / pts
Pivotal Token Search
☆128Updated 3 months ago
ByungKwanLee / MoAI
[ECCV 2024] Official PyTorch implementation code for realizing the technical part of Mixture of All Intelligence (MoAI) to improve perfor…
☆328Updated last year