Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆235Feb 13, 2026Updated last week
Alternatives and similar repositories for OneVision-Encoder
Users that are interested in OneVision-Encoder are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆36Oct 29, 2025Updated 3 months ago
- The official repo for the DanQing dataset.☆29Jan 16, 2026Updated last month
- ☆18Jul 10, 2024Updated last year
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- pytorch implementation of "Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time"☆43Jan 27, 2026Updated 3 weeks ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆213Oct 12, 2025Updated 4 months ago
- ☆213Dec 19, 2025Updated 2 months ago
- Offline implementation of UniREditBench: A Unified Reasoning-based Image Editing Benchmark.☆52Jan 7, 2026Updated last month
- [ICLR 2026] pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation☆268Feb 12, 2026Updated last week
- Reinforcing Text-Rich Video Reasoning with Visual Rumination☆27Nov 24, 2025Updated 3 months ago
- The official implementation of Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion [AAAI'2…☆15Feb 2, 2026Updated 3 weeks ago
- MLP version of SuperGaussians.☆15Dec 29, 2024Updated last year
- ☆37Oct 29, 2025Updated 3 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 2 months ago
- Code for FreeTraj, a tuning-free method for trajectory-controllable video generation☆111Sep 19, 2025Updated 5 months ago
- [ICML 2024] Matrix Information Theory for Self-supervised Learning (https://arxiv.org/abs/2305.17326)☆31Sep 21, 2025Updated 5 months ago
- Minute-long video generation at 24FPS.☆50Feb 2, 2026Updated 3 weeks ago
- [SIGGRAPH Asia 2025] The official repo for the conference paper "MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized…☆36Dec 13, 2025Updated 2 months ago
- 🚀 Official code for “XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression”, …☆30Jan 27, 2026Updated 3 weeks ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking☆13May 3, 2024Updated last year
- When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought☆26Feb 14, 2026Updated last week
- ☆21Feb 13, 2026Updated last week
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated 10 months ago
- ☆28Apr 4, 2025Updated 10 months ago
- Towards Efficient Multimodal Large Language Models: A Survey on Token Compression☆112Jan 13, 2026Updated last month
- ☆12Sep 19, 2022Updated 3 years ago
- Repository for "Echoes of the Coliseum: Towards 3D Live streaming of Sports Events"☆27Sep 4, 2025Updated 5 months ago
- An unofficial implementation for paper "DenseCLIP: Extract Free Dense Labels from CLIP"☆23Jan 27, 2022Updated 4 years ago
- OVMR: Open-Vocabulary Recognition with Multi-Modal References (CVPR24)☆35Jun 16, 2025Updated 8 months ago
- [CVPR 2026] Official pytorch implementation of "ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding"☆17Dec 17, 2025Updated 2 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆494Dec 27, 2025Updated last month
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆140Aug 21, 2025Updated 6 months ago
- ☆56Feb 12, 2026Updated 2 weeks ago
- ☆21Nov 16, 2025Updated 3 months ago
- ☆31Jan 23, 2026Updated last month
- Code for paper: Freeplane: Unlocking Free Lunch in Triplane-Based Sparse-View Reconstruction Models☆18Jun 6, 2024Updated last year
- Vision Large Language Models trained on M3IT instruction tuning dataset☆17Aug 16, 2023Updated 2 years ago
- Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders☆208Feb 13, 2026Updated last week
- [CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe☆144Updated this week