This is a collection of recent papers on reasoning in video generation models.
☆154May 13, 2026Updated 2 weeks ago
Alternatives and similar repositories for Awesome-Video-Reasoning
Users that are interested in Awesome-Video-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a framework for evaluating reasoning in foundational Video Models.☆96May 5, 2026Updated 3 weeks ago
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆20Jul 17, 2024Updated last year
- [NeurIPS 2025] | DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data☆50Dec 12, 2025Updated 5 months ago
- A collection of awesome think with videos papers.☆98Dec 1, 2025Updated 5 months ago
- [CVPR'26] VisPlay: Self-Evolving Vision-Language Models☆57Feb 25, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A Collection of Papers on Diffusion Language Models☆171Sep 15, 2025Updated 8 months ago
- Are Video Models Ready as Zero-shot Reasoners?☆87Nov 24, 2025Updated 6 months ago
- 🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.☆182May 5, 2026Updated 3 weeks ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆23Aug 1, 2025Updated 9 months ago
- ☆16May 9, 2024Updated 2 years ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆48Oct 9, 2025Updated 7 months ago
- 🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆29Dec 11, 2025Updated 5 months ago
- ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…☆65Mar 3, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆60Jan 27, 2026Updated 4 months ago
- (ICLR 2026 🔥) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆77Feb 9, 2026Updated 3 months ago
- ☆55Sep 21, 2025Updated 8 months ago
- Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (ACL-Findings 2024)☆16Apr 23, 2024Updated 2 years ago
- Beyond Accuracy: What Matters in Designing Well-Behaved Models?☆20Mar 30, 2026Updated 2 months ago
- CaptionQA: Is Your Caption as Useful as the Image Itself?☆35Mar 3, 2026Updated 2 months ago
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆28Oct 19, 2025Updated 7 months ago
- 🔥 A continuously updated collection of papers, datasets, and benchmarks on post-training and alignment for video generation.☆142Apr 13, 2026Updated last month
- Video Diffusion Transformers are In-Context Learners☆36Jan 6, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official implementation of "CONCRETE: Improving Cross-lingual Fact Checking with Cross-lingual Retrieval" (COLING'22)☆15Oct 13, 2022Updated 3 years ago
- Information and artifacts for "LoRA Learns Less and Forgets Less" (TMLR, 2024)☆21Sep 27, 2024Updated last year
- A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts…☆2,902Updated this week
- ☆14Mar 15, 2025Updated last year
- ☆23Sep 5, 2025Updated 8 months ago
- [ACL 2026 Findings, ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation☆119Apr 8, 2026Updated last month
- Visual Speech Recongnition☆20Dec 24, 2024Updated last year
- ☆87Oct 10, 2025Updated 7 months ago
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆56Feb 10, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ViT models pretrained with up to ~5k hours of human-like video data☆14Aug 10, 2023Updated 2 years ago
- ☆14Dec 31, 2024Updated last year
- ☆24Nov 29, 2024Updated last year
- Official implementation of the ACL 2023 paper: "Zero-shot Faithful Factual Error Correction"☆17Aug 14, 2023Updated 2 years ago
- ☆13Jul 19, 2022Updated 3 years ago
- A list of works on video generation towards world model☆480Mar 21, 2026Updated 2 months ago
- The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?☆43Nov 1, 2024Updated last year