Code for the Molmo2 Vision-Language Model
☆444Mar 18, 2026Updated last week
Alternatives and similar repositories for molmo2
Users that are interested in molmo2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆55Mar 15, 2026Updated last week
- [CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Models☆28Mar 18, 2026Updated last week
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 6 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆72Nov 27, 2024Updated last year
- ☆14Sep 11, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆33Aug 25, 2025Updated 7 months ago
- Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence☆305Mar 2, 2026Updated 3 weeks ago
- SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis☆37Jun 13, 2025Updated 9 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 9 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆57Sep 12, 2025Updated 6 months ago
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆55May 25, 2025Updated 10 months ago
- ☆46Jun 24, 2025Updated 9 months ago
- [NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning. [TPAMI'25] MECD+☆47Feb 11, 2026Updated last month
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆154Jul 22, 2025Updated 8 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency☆18Aug 10, 2022Updated 3 years ago
- Quick Long Video Understanding [TMLR2025]☆76Oct 27, 2025Updated 4 months ago
- PyTorch implementation of NEPA☆328Feb 9, 2026Updated last month
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 8 months ago
- Multimodal RewardBench☆64Feb 21, 2025Updated last year
- [CVPR 2026] An accurate and dense-annotated synthetic dataset for training SOTA detectors / segmentors / Grounding-VLMs.☆87Feb 23, 2026Updated last month
- [ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆167Jan 26, 2026Updated 2 months ago
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆68Mar 13, 2026Updated last week
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆18Nov 4, 2025Updated 4 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Fully Open Framework for Democratized Multimodal Training☆770Dec 27, 2025Updated 3 months ago
- ☆19Oct 28, 2025Updated 4 months ago
- ☆19Sep 2, 2025Updated 6 months ago
- TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics☆55Mar 6, 2026Updated 3 weeks ago
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- ☆20Mar 17, 2026Updated last week
- AutoGaze automatically removes redundant patches in a video, reducing #tokens in ViT/MLLM by 4x-100x.☆156Mar 19, 2026Updated last week
- FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens☆17Sep 8, 2025Updated 6 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆284Nov 6, 2025Updated 4 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This is the offical repository of LLAVIDAL☆23Oct 4, 2025Updated 5 months ago
- ☆13Jun 26, 2024Updated last year
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Reinforcing Text-Rich Video Reasoning with Visual Rumination☆27Nov 24, 2025Updated 4 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆31Aug 7, 2025Updated 7 months ago
- ☆14May 20, 2025Updated 10 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆143Aug 21, 2025Updated 7 months ago