zhang9302002/ThinkingWithVideos

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhang9302002/ThinkingWithVideos)

zhang9302002 / ThinkingWithVideos

The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"

☆102

Alternatives and similar repositories for ThinkingWithVideos

Users that are interested in ThinkingWithVideos are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AndyTang15 / FLAG3Dv2
View on GitHub
☆25May 9, 2024Updated 2 years ago
yongliu20 / Awesome-Unified-Understanding-and-Generation
View on GitHub
☆52Aug 22, 2025Updated 11 months ago
shiyi-zh0408 / NAE_CVPR2024
View on GitHub
[CVPR 2024] Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
☆43May 16, 2024Updated 2 years ago
VoyageWang / VG-Refiner
View on GitHub
The repository of VG-Refiner paper
☆20Dec 9, 2025Updated 7 months ago
AndyTang15 / FLAG3D
View on GitHub
☆19Jun 22, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Yxxxb / LAVT-RS
View on GitHub
[CVPR'2022, TPAMI'2024] LAVT: Language-Aware Vision Transformer for Referring Segmentation
☆26Jan 21, 2025Updated last year
SuleBai / SC-CLIP
View on GitHub
[TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
☆73Mar 27, 2026Updated 3 months ago
EvolvingLMMs-Lab / LongVT
View on GitHub
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
☆255Jun 24, 2026Updated last month
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
RUC-NLPIR / VideoDeepResearch
View on GitHub
☆155Nov 17, 2025Updated 8 months ago
shiyi-zh0408 / Meta-CoT
View on GitHub
[CVPR 2026] Official code of the paper "Meta-CoT: Enhancing Granularity and Generalization in Image Editing"
☆78May 6, 2026Updated 2 months ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
AMAP-ML / UniVG-R1
View on GitHub
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
☆166Jun 2, 2025Updated last year
TencentARC / TimeLens
View on GitHub
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
☆161Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
IVGSZ / Flash-VStream
View on GitHub
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆287Oct 15, 2025Updated 9 months ago
EternalEvan / DPMesh
View on GitHub
The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery", CVPR 2024
☆45Jun 4, 2024Updated 2 years ago
yunlong10 / Awesome-Video-LMM-Post-Training
View on GitHub
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training
☆296Mar 3, 2026Updated 4 months ago
EvolvingLMMs-Lab / ParaVT
View on GitHub
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
☆54Jun 2, 2026Updated last month
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
Koreyoshi01 / VISD
View on GitHub
This repository is the official implementation for VISD.
☆22May 17, 2026Updated 2 months ago
Tengbo-Yu / AnyBimanual
View on GitHub
[ICCV2025] AnyBimanual: Transfering Unimanual Policy for General Bimanual Manipulation
☆103Jun 26, 2025Updated last year
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆882Dec 14, 2025Updated 7 months ago
InvincibleWyq / ChatVID
View on GitHub
Chat about anything on any video!
☆39Sep 5, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
IVUL-KAUST / VideoAuto-R1
View on GitHub
[CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆88Feb 27, 2026Updated 4 months ago
Lzq5 / UniTime
View on GitHub
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
☆56May 20, 2026Updated 2 months ago
RammusLeo / DPMesh
View on GitHub
The repository contains the official implementation of "DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery"
☆25Jul 25, 2024Updated last year
BRZ911 / ViTCoT
View on GitHub
[ACM MM 2025] ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models
☆18Jul 15, 2025Updated last year
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
RammusLeo / ScoreHOI
View on GitHub
Official repository of ScoreHOI (ICCV 2025)
☆16Dec 21, 2025Updated 7 months ago
Jixuan-Fan / Momentum-GS
View on GitHub
[ICCV 2025] Code for Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
☆173Dec 15, 2025Updated 7 months ago
QiWang98 / VideoRFT
View on GitHub
[NeurIPS 2025] VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning
☆65Jan 6, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
HumanMLLM / LOVE-R1
View on GitHub
Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"
☆24Nov 1, 2025Updated 8 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆726Sep 24, 2025Updated 9 months ago
zsgvivo / VideoZoomer
View on GitHub
☆34Feb 12, 2026Updated 5 months ago
jbistanbul / universalvtg
View on GitHub
Official Code for the paper "UniversalVTG: A Univeral and Lightweight Foundation Model for Video Temporal Grounding"
☆15Apr 15, 2026Updated 3 months ago
KlingAIResearch / VANS
View on GitHub
[CVPR 2026] Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
☆119Feb 28, 2026Updated 4 months ago
wgcyeo / WorldMM
View on GitHub
[CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
☆97Jun 18, 2026Updated last month
yongliu20 / SCAN
View on GitHub
[CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"
☆77Sep 23, 2024Updated last year