This is a collection of recent papers on reasoning in video generation models.
☆145Mar 23, 2026Updated this week
Alternatives and similar repositories for Awesome-Video-Reasoning
Users that are interested in Awesome-Video-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a framework for evaluating reasoning in foundational Video Models.☆83Mar 7, 2026Updated 3 weeks ago
- DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data☆46Dec 12, 2025Updated 3 months ago
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆20Jul 17, 2024Updated last year
- VisPlay: Self-Evolving Vision-Language Models☆51Feb 25, 2026Updated last month
- A collection of awesome think with videos papers.☆95Dec 1, 2025Updated 3 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆21Aug 1, 2025Updated 7 months ago
- ☆15May 9, 2024Updated last year
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆32Apr 20, 2025Updated 11 months ago
- 🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆27Dec 11, 2025Updated 3 months ago
- ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…☆56Mar 3, 2026Updated 3 weeks ago
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆47Jan 27, 2026Updated 2 months ago
- ☆55Sep 21, 2025Updated 6 months ago
- A Collection of Papers on Diffusion Language Models☆161Sep 15, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (ACL-Findings 2024)☆16Apr 23, 2024Updated last year
- The first multiplayer video world model in Minecraft☆173Mar 3, 2026Updated 3 weeks ago
- CaptionQA: Is Your Caption as Useful as the Image Itself?☆36Mar 3, 2026Updated 3 weeks ago
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆24Oct 19, 2025Updated 5 months ago
- Visual Speech Recongnition☆20Dec 24, 2024Updated last year
- Information and artifacts for "LoRA Learns Less and Forgets Less" (TMLR, 2024)☆20Sep 27, 2024Updated last year
- Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder".☆141Dec 18, 2025Updated 3 months ago
- Start and Fork ⭐⭐⭐☆12Jun 9, 2023Updated 2 years ago
- ☆14Mar 15, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆54Feb 10, 2025Updated last year
- ViT models pretrained with up to ~5k hours of human-like video data☆14Aug 10, 2023Updated 2 years ago
- ☆14Dec 31, 2024Updated last year
- A list of works on video generation towards world model☆443Mar 21, 2026Updated last week
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 8 months ago
- ☆23Nov 29, 2024Updated last year
- ☆13Jul 19, 2022Updated 3 years ago
- Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning☆137Mar 6, 2026Updated 3 weeks ago
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)☆13Feb 7, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code & data for "Towards flexible perception with visual memory" (ICML 2025)☆18Sep 24, 2024Updated last year
- [CVPR 2026] ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks☆32Mar 10, 2026Updated 2 weeks ago
- Official code release for the paper Trapped in texture bias? A large scale comparison of deep instance segmentation, accepted at ECCV 202…☆16Jan 16, 2024Updated 2 years ago
- Robustness properties of Facebook's ResNeXt WSL models☆15Dec 7, 2019Updated 6 years ago
- [ICLR 2026] FOCUS: Efficient Keyframe Selection for Long Video Understanding☆57Feb 3, 2026Updated last month
- 国科大雁栖湖校区2024~2025年课程资料,包括强化学习、智能计算系统、模式识别、矩阵分析与应用、人工智能原理与算法、自然语言处理☆39Sep 22, 2025Updated 6 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆34May 27, 2025Updated 10 months ago