MCG-NJU / Video-o3View external linksLinks
Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning
☆36Updated this week
Alternatives and similar repositories for Video-o3
Users that are interested in Video-o3 are comparing it to the libraries listed below
Sorting:
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated 11 months ago
- Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text…☆16Mar 29, 2023Updated 2 years ago
- ☆23Updated this week
- ☆16Sep 25, 2025Updated 4 months ago
- (ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training☆13Feb 15, 2025Updated last year
- Codes and data for AAAI-24 paper "Advancing Spatial Reasoning in Large Language Models: An In-depth Evaluation and Enhancement Using the …☆14Apr 23, 2024Updated last year
- Official code implementation for the paper "Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Expl…☆12Apr 4, 2025Updated 10 months ago
- Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use☆28Nov 4, 2025Updated 3 months ago
- ☆21Nov 11, 2024Updated last year
- Thinking with Programming Vision: Towards a Unified View for Thinking with Images☆56Jan 23, 2026Updated 3 weeks ago
- ☆55Updated this week
- Nanjing University Advanced Machine Learning Review☆31Jun 11, 2025Updated 8 months ago
- ☆22Jan 17, 2025Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Nov 25, 2025Updated 2 months ago
- [ICCV 2025] Boosting MLLM Reasoning with Text-Debiased Hint-GRPO☆43Jul 1, 2025Updated 7 months ago
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆23Jan 26, 2025Updated last year
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆36Jan 12, 2026Updated last month
- [NeurIPS 2025 Spotlight] Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning☆46Jan 20, 2026Updated 3 weeks ago
- EMNLP MAIN 2025 StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization☆59Sep 13, 2025Updated 5 months ago
- [2023]Run GraphCast in one click. The code will automatically install and run the model environment, automatically inferencing and demons…☆30Mar 23, 2024Updated last year
- Make Your Training Flexible: Towards Deployment-Efficient Video Models☆36Jun 11, 2025Updated 8 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Jul 2, 2025Updated 7 months ago
- A tool for packing and unpacking BigWorld compressed data sections from/to plain XML☆682Jul 20, 2025Updated 6 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆53Jul 23, 2025Updated 6 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models☆42Apr 22, 2025Updated 9 months ago
- ☆57Oct 2, 2025Updated 4 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆251Oct 17, 2025Updated 4 months ago
- yolov8s-pose using ncnn inferring!☆44Apr 27, 2023Updated 2 years ago
- ☆40May 29, 2019Updated 6 years ago
- [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning☆256Oct 18, 2025Updated 3 months ago
- ☆53Oct 10, 2024Updated last year
- ☆53Apr 21, 2024Updated last year
- ☆61Oct 13, 2023Updated 2 years ago
- Implementation of the paper "wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations" in Pytorch.☆55May 19, 2023Updated 2 years ago
- Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"☆62Nov 18, 2025Updated 2 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Pytorch implementation of BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active …☆56Apr 12, 2022Updated 3 years ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Jul 22, 2025Updated 6 months ago