facebookresearch / MILS
Code release for "LLMs can see and hear without any training"
☆432Updated last week
Alternatives and similar repositories for MILS
Users that are interested in MILS are comparing it to the libraries listed below
Sorting:
- Dream 7B, a large diffusion language model☆630Updated 2 weeks ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆811Updated 9 months ago
- Build your own visual reasoning model☆362Updated this week
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,650Updated last week
- Rethinking Step-by-step Visual Reasoning in LLMs☆293Updated 3 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆307Updated 6 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆513Updated last month
- Pretraining code for a large-scale depth-recurrent language model☆760Updated last month
- Live-bending a foundation model’s output at neural network level.☆249Updated last month
- Code for the Molmo Vision-Language Model☆418Updated 5 months ago
- OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning☆214Updated this week
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆759Updated 9 months ago
- Continuous Thought Machines, because thought takes time and reasoning is a process.☆492Updated this week
- Agent Reinforcement Trainer for training multi-turn agents using GRPO☆560Updated this week
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,281Updated 3 weeks ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆925Updated last month
- Kyutai with an "eye"☆191Updated last month
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆327Updated 5 months ago
- Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,065Updated last week
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆595Updated last month
- ☆151Updated last week
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆1,071Updated last week
- Official PyTorch implementation of TokenSet.☆118Updated last month
- PyTorch implementation of models from the Zamba2 series.☆181Updated 3 months ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,058Updated 3 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆586Updated 2 months ago
- ☆79Updated 2 months ago
- LIMO: Less is More for Reasoning☆940Updated last month
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆92Updated last month
- ☆120Updated 8 months ago