zsgvivo/VideoZoomer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zsgvivo/VideoZoomer)

zsgvivo / VideoZoomer

☆34

Alternatives and similar repositories for VideoZoomer

Users that are interested in VideoZoomer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alchemistyzz / PeRL
View on GitHub
[NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"
☆30Mar 30, 2026Updated 3 months ago
zss02 / BiPS
View on GitHub
[CVPR 2026] See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning
☆22Jun 28, 2026Updated last month
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
qishisuren123 / AnyCap
View on GitHub
A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…
☆54Jul 24, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / PixelCraft
View on GitHub
[ICLR 2026] High-Fidelity Visual Reasoning on Structured Images
☆30Jul 17, 2026Updated last week
wangruohui / EfficientVideoAgent
View on GitHub
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
☆26May 6, 2026Updated 2 months ago
EvolvingLMMs-Lab / ParaVT
View on GitHub
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
☆54Jun 2, 2026Updated last month
CYWang735 / AdaTooler-V
View on GitHub
☆72Feb 27, 2026Updated 5 months ago
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago
longvideoagent / LongVideoAgent
View on GitHub
☆120Updated this week
FreedomIntelligence / MyPhoneBench
View on GitHub
MyPhoneBench: Do Phone-Use Agents Respect Your Privacy?
☆24Apr 3, 2026Updated 3 months ago
Haiyang0226 / Symphony
View on GitHub
code of cvpr26 paper Symphony
☆17Apr 7, 2026Updated 3 months ago
SalesforceAIResearch / ActiveVideoPerception
View on GitHub
Official Code for paper "Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding""
☆18Jun 2, 2026Updated last month
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆102Oct 15, 2025Updated 9 months ago
EliSpectre / MM-Mem
View on GitHub
[ACL-26 (main)] From Verbatim to Gist Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video A…
☆39Apr 19, 2026Updated 3 months ago
hmxiong / StreamChat
View on GitHub
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆111Mar 14, 2025Updated last year
OuyangKun10 / Conan
View on GitHub
Multi-step reasoning MLLM
☆25Mar 8, 2026Updated 4 months ago
64327069 / LVAgent
View on GitHub
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆39Nov 24, 2025Updated 8 months ago
MCG-NJU / Video-o3
View on GitHub
[ICML 2026] Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning
☆135Jul 2, 2026Updated 3 weeks ago
GeWu-Lab / APPO
View on GitHub
The official repository for CVPR'26 Paper "APPO: Attention-guided Perception Policy Optimization for Video Reasoning"
☆17Mar 19, 2026Updated 4 months ago
egolife-ai / Ego-R1
View on GitHub
[TPAMI 2026] Ego-R1: Agentic Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆165Jun 10, 2026Updated last month
PhoneBuddyAI / phonebuddy
View on GitHub
Training open models for agentic phone use with real-app and mock-app environments.
☆52Jun 26, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
yangruoliu / VideoDetective
View on GitHub
VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
☆58May 1, 2026Updated 2 months ago
allenai / SAGE
View on GitHub
[arXiv 2025] SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
☆70Dec 17, 2025Updated 7 months ago
wanglu-cs / Think_While_Watching
View on GitHub
☆19Jun 26, 2026Updated last month
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
jiyt17 / ReDiff
View on GitHub
Codebase of 'From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model'
☆45Jun 27, 2026Updated last month
Cominclip / OmniVerifier
View on GitHub
[ICLR 2026 Oral & ICML 2026] Generative Universal Verifier as Multimodal Meta-Reasoner
☆64May 29, 2026Updated 2 months ago
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
JIA-Lab-research / MAT
View on GitHub
MAT: Mask-Aware Transformer for Large Hole Image Inpainting
☆17Apr 1, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
PhoneHarness / PhoneHarness
View on GitHub
PhoneHarness runtime harness for mixed-action phone agents
☆35Jun 17, 2026Updated last month
ModalityDance / Omni-R1
View on GitHub
[ACL 2026 Findings] "Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning"
☆63May 26, 2026Updated 2 months ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆423Jan 29, 2026Updated 6 months ago
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 9 months ago
PKU-YuanGroup / Look-Back
View on GitHub
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
☆100Jul 10, 2025Updated last year
Candice-yu / GeoLaux
View on GitHub
A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines
☆38Apr 27, 2026Updated 3 months ago
wgcyeo / WorldMM
View on GitHub
[CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
☆98Jun 18, 2026Updated last month