Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆299Mar 2, 2026Updated 2 weeks ago
Alternatives and similar repositories for OneVision-Encoder
Users that are interested in OneVision-Encoder are comparing it to the libraries listed below
Sorting:
- The official repo for the DanQing dataset.☆32Jan 16, 2026Updated 2 months ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- V-SWIFT: Training a Small VideoMAE Model on a Single Machine in a Day☆29Feb 5, 2025Updated last year
- OVMR: Open-Vocabulary Recognition with Multi-Modal References (CVPR24)☆36Jun 16, 2025Updated 9 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- ☆22Feb 13, 2026Updated last month
- ☆18Jul 10, 2024Updated last year
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆36Oct 29, 2025Updated 4 months ago
- Official code for MotionBench (CVPR 2025)☆70Mar 3, 2025Updated last year
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆219Oct 12, 2025Updated 5 months ago
- Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders☆223Feb 13, 2026Updated last month
- ☆24Feb 17, 2026Updated last month
- [CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"☆39Nov 26, 2025Updated 3 months ago
- Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"☆21Mar 10, 2026Updated last week
- [ICLR 2026] pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation☆280Feb 23, 2026Updated 3 weeks ago
- ☆213Dec 19, 2025Updated 3 months ago
- A collection of awesome think with videos papers.☆95Dec 1, 2025Updated 3 months ago
- Code for FreeTraj, a tuning-free method for trajectory-controllable video generation☆111Sep 19, 2025Updated 6 months ago
- (CVPR 2026) Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation☆30Feb 28, 2026Updated 3 weeks ago
- Offline implementation of UniREditBench: A Unified Reasoning-based Image Editing Benchmark.☆54Jan 7, 2026Updated 2 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated 11 months ago
- pytorch implementation of "Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time"☆48Jan 27, 2026Updated last month
- 📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"☆21Dec 2, 2025Updated 3 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆143Aug 21, 2025Updated 7 months ago
- When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought☆27Feb 14, 2026Updated last month
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆83Mar 9, 2026Updated last week
- MLP version of SuperGaussians.☆16Dec 29, 2024Updated last year
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆20May 22, 2025Updated 10 months ago
- Code for the Molmo2 Vision-Language Model☆397Updated this week
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆33May 27, 2025Updated 9 months ago
- Unlocking Iterative Reasoning for Any Image Editor☆99Jan 18, 2026Updated 2 months ago
- Repository for "Echoes of the Coliseum: Towards 3D Live streaming of Sports Events"☆27Sep 4, 2025Updated 6 months ago
- We introduce BabyVision, a benchmark revealing the infancy of AI vision.☆197Jan 13, 2026Updated 2 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆514Dec 27, 2025Updated 2 months ago
- PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation☆56Jan 5, 2026Updated 2 months ago
- [CVPR 2025] EgoLife: Towards Egocentric Life Assistant☆405Mar 19, 2025Updated last year
- Official Code of CVPR 2025 paper "SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters"☆52Jul 13, 2025Updated 8 months ago
- Toolbox for GTA-Human Datasets☆25Oct 9, 2024Updated last year