Are Video Models Ready as Zero-shot Reasoners?
☆84Nov 24, 2025Updated 3 months ago
Alternatives and similar repositories for MME-CoF
Users that are interested in MME-CoF are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆36Oct 29, 2025Updated 4 months ago
- A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond☆100Updated this week
- [ICCV2025] The official code of "DreamRelation: Relation-Centric Video Customization"☆27Feb 4, 2026Updated last month
- The first Interleaved framework for textual reasoning within the visual generation process☆158Updated this week
- CVPR2025 | TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation☆38Jan 29, 2026Updated last month
- ☆41Oct 29, 2025Updated 4 months ago
- [NIPS 2025] FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens☆20Oct 12, 2025Updated 4 months ago
- Reinforcing Text-Rich Video Reasoning with Visual Rumination☆27Nov 24, 2025Updated 3 months ago
- Just wanna see what type and how many GPUs/TPUs are used in CVPR 2025 oral papers. Fun vibe coding with LLMs.☆12Apr 24, 2025Updated 10 months ago
- ☆21Feb 13, 2026Updated 3 weeks ago
- Implementation for "DeltaPhi: Learning Physical Trajectory Residual for PDE Solving"☆13Jun 17, 2024Updated last year
- ☆30Dec 18, 2025Updated 2 months ago
- InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion☆83Dec 27, 2025Updated 2 months ago
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆30Oct 5, 2025Updated 5 months ago
- Official implementation of "MV-TAP: Tracking Any Point in Multi-View Videos"☆39Feb 24, 2026Updated last week
- [CVPR 2026] Official pytorch implementation of "ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding"☆21Dec 17, 2025Updated 2 months ago
- WMS庫存管理系統是一個基於PHP的Web應用程序,旨在幫助中小型企業管理其庫存、供應商和產品信息。該系統提供了直觀的用戶界面和強大的功能,使庫存管理變得簡單高效。☆20Feb 18, 2025Updated last year
- This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehe…☆126Jan 29, 2026Updated last month
- ☆45Dec 6, 2025Updated 3 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 2 months ago
- Code for "StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model", AAAI2026 Oral☆45Jan 16, 2026Updated last month
- ☆36Dec 16, 2025Updated 2 months ago
- open-sourced video dataset with dynamic scenes and camera movements annotation☆86Apr 24, 2025Updated 10 months ago
- [CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation☆858May 23, 2025Updated 9 months ago
- 📷 [CVPR'26] Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!☆126Feb 21, 2026Updated 2 weeks ago
- Official implementation of "Repurposing Video Diffusion Transformers for Robust Point Tracking"☆40Dec 24, 2025Updated 2 months ago
- ☆48Apr 14, 2025Updated 10 months ago
- [CVPR 2026] SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time☆99Jan 1, 2026Updated 2 months ago
- VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs☆50Jan 5, 2026Updated 2 months ago
- ☆53Dec 10, 2025Updated 2 months ago
- Official Implementation of "Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation"☆46Jan 29, 2026Updated last month
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆101Sep 19, 2025Updated 5 months ago
- [CVPR 2026] ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training☆178Updated this week
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆37Dec 2, 2025Updated 3 months ago
- RePlan: Reasoning-Guided Region Planning for Complex Instruction-Based Image Editing☆58Dec 26, 2025Updated 2 months ago
- EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing [ICLR 2026]☆126Feb 6, 2026Updated last month
- A low-cost avatar system☆68Nov 13, 2025Updated 3 months ago
- This is the official project repository for "DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diff…☆37Aug 27, 2025Updated 6 months ago
- ☆40Jun 10, 2025Updated 8 months ago