This is a collection of recent papers on reasoning in video generation models.
☆150Mar 30, 2026Updated 2 weeks ago
Alternatives and similar repositories for Awesome-Video-Reasoning
Users that are interested in Awesome-Video-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a framework for evaluating reasoning in foundational Video Models.☆88Apr 1, 2026Updated 2 weeks ago
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆20Jul 17, 2024Updated last year
- DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data☆47Dec 12, 2025Updated 4 months ago
- A collection of awesome think with videos papers.☆98Dec 1, 2025Updated 4 months ago
- Are Video Models Ready as Zero-shot Reasoners?☆86Nov 24, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆40Dec 2, 2025Updated 4 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆21Aug 1, 2025Updated 8 months ago
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆32Apr 20, 2025Updated 11 months ago
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆43Oct 9, 2025Updated 6 months ago
- ☆16May 9, 2024Updated last year
- 🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆28Dec 11, 2025Updated 4 months ago
- ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…☆59Mar 3, 2026Updated last month
- (ICLR 2026 🔥) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆76Feb 9, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆55Sep 21, 2025Updated 6 months ago
- Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (ACL-Findings 2024)☆16Apr 23, 2024Updated last year
- CaptionQA: Is Your Caption as Useful as the Image Itself?☆36Mar 3, 2026Updated last month
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆26Oct 19, 2025Updated 6 months ago
- A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts…☆2,571Updated this week
- ☆10Dec 17, 2024Updated last year
- The first multiplayer video world model in Minecraft☆186Mar 3, 2026Updated last month
- 📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for str…☆114Updated this week
- 🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.☆166Mar 16, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official implementation of "CONCRETE: Improving Cross-lingual Fact Checking with Cross-lingual Retrieval" (COLING'22)☆15Oct 13, 2022Updated 3 years ago
- Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder".☆142Dec 18, 2025Updated 4 months ago
- [ACL 2026 Findings, ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation☆116Apr 8, 2026Updated last week
- Visual Speech Recongnition☆20Dec 24, 2024Updated last year
- Start and Fork ⭐⭐⭐☆12Jun 9, 2023Updated 2 years ago
- ☆86Oct 10, 2025Updated 6 months ago
- ☆86Jan 13, 2026Updated 3 months ago
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆55Feb 10, 2025Updated last year
- ViT models pretrained with up to ~5k hours of human-like video data☆14Aug 10, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A list of works on video generation towards world model☆458Mar 21, 2026Updated 3 weeks ago
- ☆24Nov 29, 2024Updated last year
- ☆13Jul 19, 2022Updated 3 years ago
- The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?☆43Nov 1, 2024Updated last year
- ☆43Dec 18, 2025Updated 4 months ago
- Sandbox for generating visualizations of the bias-variance tradeoff for Machine Learning at Berkeley's blog.☆13Jun 26, 2017Updated 8 years ago
- Official code release for the paper Trapped in texture bias? A large scale comparison of deep instance segmentation, accepted at ECCV 202…☆16Jan 16, 2024Updated 2 years ago