allenai/molmo2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/molmo2)

allenai / molmo2

Code for the Molmo2 Vision-Language Model

☆692

Alternatives and similar repositories for molmo2

Users that are interested in molmo2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

EvolvingLMMs-Lab / LLaVA-OneVision-2
View on GitHub
Fully Open Framework for Democratized Multimodal Training
☆1,143Updated this week
allenai / molmo
View on GitHub
Code for the Molmo Vision-Language Model
☆918Dec 12, 2024Updated last year
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆385Jun 20, 2026Updated last month
tencent-ailab / Penguin-VL
View on GitHub
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders [Technical Report]
☆204Mar 30, 2026Updated 3 months ago
IDEA-Research / Rex-Omni
View on GitHub
[CVPR2026] Detect Anything via Next Point Prediction
☆1,507Feb 22, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
RAIVNLab / VideoNet
View on GitHub
CVPR '26 Highlight
☆25May 6, 2026Updated 2 months ago
RAIVNLab / VFig
View on GitHub
This is the repository for VFig: Vectorizing Complex Figures with Vision-Language Models
☆19Apr 24, 2026Updated 2 months ago
cambrian-mllm / cambrian-s
View on GitHub
Cambrian-S: Towards Spatial Supersensing in Video
☆560Apr 3, 2026Updated 3 months ago
facebookresearch / perception_models
View on GitHub
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
☆2,324Apr 13, 2026Updated 3 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 9 months ago
facebookresearch / tuna-2
View on GitHub
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
☆738Updated this week
open-compass / VLMEvalKit
View on GitHub
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
☆4,291Updated this week
allenai / WildDet3D
View on GitHub
Allen Institute for AI: WildDet3D: Scaling Promptable 3D Detection in the Wild
☆597Jun 1, 2026Updated last month
allenai / MolmoBot
View on GitHub
Code and website for "MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation".
☆96Jun 4, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facebookresearch / dinov3
View on GitHub
Reference PyTorch implementation and models for DINOv3
☆10,973Updated this week
mit-han-lab / streaming-vlm
View on GitHub
StreamingVLM: Real-Time Understanding for Infinite Video Streams
☆1,046Oct 15, 2025Updated 9 months ago
UCSC-VLAA / OpenVision
View on GitHub
OpenVision (ICCV 2025), OpenVision 2 (CVPR 2026), and OpenVision 3
☆487Feb 21, 2026Updated 5 months ago
baaivision / Emu3.5
View on GitHub
Native Multimodal Models are World Learners
☆1,536Dec 30, 2025Updated 6 months ago
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆198Mar 25, 2026Updated 3 months ago
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated 11 months ago
facebookresearch / Action100M
View on GitHub
A Large-scale Video Action Dataset
☆481Jan 16, 2026Updated 6 months ago
THU-SI / Spatial-TTT
View on GitHub
[ECCV 2026] Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
☆238Jun 19, 2026Updated last month
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,630Jan 30, 2026Updated 5 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / sam3
View on GitHub
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading t…
☆11,016Updated this week
allenai / molmoweb
View on GitHub
☆579Jun 26, 2026Updated 3 weeks ago
zlab-princeton / vero
View on GitHub
Vero: An Open RL Recipe for General Visual Reasoning
☆134Jun 19, 2026Updated last month
cyuQ1n / EasyVideoR1
View on GitHub
☆156Apr 27, 2026Updated 2 months ago
allenai / OLMo-core
View on GitHub
PyTorch building blocks for the OLMo ecosystem
☆1,402Updated this week
EvolvingLMMs-Lab / LongVT
View on GitHub
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
☆254Jun 24, 2026Updated 3 weeks ago
ByteDance-Seed / Bagel
View on GitHub
Open-source unified multimodal model
☆6,103May 4, 2026Updated 2 months ago
google-deepmind / tips
View on GitHub
TIPSv2 (CVPR'26) and TIPS (ICLR'25)
☆572Jun 1, 2026Updated last month
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,709Jun 15, 2026Updated last month
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
SihanXU / nepa
View on GitHub
PyTorch implementation of NEPA
☆338Feb 9, 2026Updated 5 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,249Nov 20, 2025Updated 8 months ago
tulerfeng / OneThinker
View on GitHub
🔥 OneThinker: All-in-one Reasoning Model for Image and Video [CVPR 2026]
☆463Feb 28, 2026Updated 4 months ago
mbzuai-oryx / VideoMolmo
View on GitHub
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆56Jul 5, 2025Updated last year
facebookresearch / DepthLM_Official
View on GitHub
[ICLR 2026 Oral (top 1.2%)] Official implementation of DepthLM
☆362Jun 1, 2026Updated last month
Go2Heart / OmniStream
View on GitHub
[ECCV 2026] OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
☆113Mar 15, 2026Updated 4 months ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago