This is a collection of recent papers on reasoning in video generation models.
☆153May 2, 2026Updated last week
Alternatives and similar repositories for Awesome-Video-Reasoning
Users that are interested in Awesome-Video-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a framework for evaluating reasoning in foundational Video Models.☆89Apr 30, 2026Updated last week
- DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data☆48Dec 12, 2025Updated 4 months ago
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆20Jul 17, 2024Updated last year
- [CVPR'26] VisPlay: Self-Evolving Vision-Language Models☆56Feb 25, 2026Updated 2 months ago
- A Collection of Papers on Diffusion Language Models☆169Sep 15, 2025Updated 7 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- 🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.☆174Updated this week
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆42Dec 2, 2025Updated 5 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆32Apr 20, 2025Updated last year
- ☆16May 9, 2024Updated 2 years ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆44Oct 9, 2025Updated 7 months ago
- 🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆28Dec 11, 2025Updated 4 months ago
- ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…☆63Mar 3, 2026Updated 2 months ago
- (ICLR 2026 🔥) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆77Feb 9, 2026Updated 3 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆54Jan 27, 2026Updated 3 months ago
- ☆55Sep 21, 2025Updated 7 months ago
- Beyond Accuracy: What Matters in Designing Well-Behaved Models?☆19Mar 30, 2026Updated last month
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆27Oct 19, 2025Updated 6 months ago
- 🔥 A continuously updated collection of papers, datasets, and benchmarks on post-training and alignment for video generation.☆134Apr 13, 2026Updated 3 weeks ago
- Video Diffusion Transformers are In-Context Learners☆36Jan 6, 2025Updated last year
- A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts…☆2,719Apr 14, 2026Updated 3 weeks ago
- Information and artifacts for "LoRA Learns Less and Forgets Less" (TMLR, 2024)☆21Sep 27, 2024Updated last year
- Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder".☆149Dec 18, 2025Updated 4 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The first multiplayer video world model in Minecraft☆193Mar 3, 2026Updated 2 months ago
- Start and Fork ⭐⭐⭐☆12Jun 9, 2023Updated 2 years ago
- [ACL 2026 Findings, ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation☆119Apr 8, 2026Updated last month
- ☆23Sep 5, 2025Updated 8 months ago
- 基于PaddleNLP的对话意图识别☆10Apr 11, 2023Updated 3 years ago
- Visual Speech Recongnition☆20Dec 24, 2024Updated last year
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆55Feb 10, 2025Updated last year
- ☆14Dec 31, 2024Updated last year
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆24Nov 29, 2024Updated last year
- A list of works on video generation towards world model☆472Mar 21, 2026Updated last month
- Code & data for "Towards flexible perception with visual memory" (ICML 2025)☆18Sep 24, 2024Updated last year
- 📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for str…☆149Apr 13, 2026Updated 3 weeks ago
- Sandbox for generating visualizations of the bias-variance tradeoff for Machine Learning at Berkeley's blog.☆13Jun 26, 2017Updated 8 years ago
- Robustness properties of Facebook's ResNeXt WSL models☆15Dec 7, 2019Updated 6 years ago
- LVAS-Agent Code Base☆20Apr 15, 2025Updated last year