Are Video Models Ready as Zero-shot Reasoners?
☆86Nov 24, 2025Updated 4 months ago
Alternatives and similar repositories for MME-CoF
Users that are interested in MME-CoF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"☆22Dec 2, 2025Updated 3 months ago
- The first Interleaved framework for textual reasoning within the visual generation process☆161Mar 16, 2026Updated last week
- [ICCV2025] The official code of "DreamRelation: Relation-Centric Video Customization"☆28Feb 4, 2026Updated last month
- ☆19Sep 24, 2024Updated last year
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆102Sep 19, 2025Updated 6 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- CVPR2025 | TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation☆42Jan 29, 2026Updated 2 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 3 months ago
- [CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation☆861Mar 19, 2026Updated last week
- This is a framework for evaluating reasoning in foundational Video Models.☆83Mar 7, 2026Updated 3 weeks ago
- Implementation for "DeltaPhi: Learning Physical Trajectory Residual for PDE Solving"☆13Jun 17, 2024Updated last year
- This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehe…☆128Jan 29, 2026Updated 2 months ago
- ☆31Dec 18, 2025Updated 3 months ago
- Just wanna see what type and how many GPUs/TPUs are used in CVPR 2025 oral papers. Fun vibe coding with LLMs.☆12Apr 24, 2025Updated 11 months ago
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆83Jun 13, 2025Updated 9 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆57Oct 17, 2021Updated 4 years ago
- ☆22Feb 13, 2026Updated last month
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆85Mar 9, 2026Updated 2 weeks ago
- InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion☆85Dec 27, 2025Updated 3 months ago
- Official implementation of "Repurposing Video Diffusion Transformers for Robust Point Tracking"☆42Dec 24, 2025Updated 3 months ago
- Mirage: One-Step Video Diffusion for Photorealistic and Coherent Asset Editing in Driving Scenes☆27Mar 12, 2026Updated 2 weeks ago
- WMS庫存管理系統是一個基於PHP的Web應用程序,旨在幫助中小型企業管理其庫存 、供應商和產品信息。該系統提供了直觀的用戶界面和強大的功能,使庫存管理變得簡單高效。☆21Feb 18, 2025Updated last year
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆39Dec 2, 2025Updated 3 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆137Aug 5, 2025Updated 7 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- The first open-domain closed-loop revisited benchmark for evaluating memory consistency and action control in world models.☆49Feb 10, 2026Updated last month
- Official implementation of "MV-TAP: Tracking Any Point in Multi-View Videos"☆41Mar 10, 2026Updated 2 weeks ago
- ☆15Mar 18, 2025Updated last year
- ☆66Feb 1, 2026Updated last month
- 📷 [CVPR'26] Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!☆138Mar 19, 2026Updated last week
- [ICML 2025] EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM☆71Jul 16, 2025Updated 8 months ago
- ☆86Oct 10, 2025Updated 5 months ago
- ☆48Apr 14, 2025Updated 11 months ago
- open-sourced video dataset with dynamic scenes and camera movements annotation☆89Apr 24, 2025Updated 11 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆42Oct 29, 2025Updated 5 months ago
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆30Oct 5, 2025Updated 5 months ago
- A low-cost avatar system☆68Nov 13, 2025Updated 4 months ago
- [NIPS 2025] FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens☆21Oct 12, 2025Updated 5 months ago
- ☆11Nov 21, 2022Updated 3 years ago
- [ICCV 2025 Workshop Outstanding Paper Award] VChain: Chain-of-Visual-Thought for Reasoning in Video Generation☆115Oct 7, 2025Updated 5 months ago
- [NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT☆432Sep 18, 2025Updated 6 months ago