Are Video Models Ready as Zero-shot Reasoners?
☆87Nov 24, 2025Updated 6 months ago
Alternatives and similar repositories for MME-CoF
Users that are interested in MME-CoF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 📝The official repository of "Rethinking Cross-Generator Image Forgery Detection through DINOv3"☆25Dec 2, 2025Updated 6 months ago
- [TMLR] Video Generation Models: A Survey of Post-Training and Alignment | 🔥 A continuously updated collection of papers, datasets, and b…☆159Jun 10, 2026Updated last week
- The first Interleaved framework for textual reasoning within the visual generation process☆162Mar 16, 2026Updated 3 months ago
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆43Oct 29, 2025Updated 7 months ago
- [ICCV2025] The official code of "DreamRelation: Relation-Centric Video Customization"☆26Feb 4, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆19Sep 24, 2024Updated last year
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆106Sep 19, 2025Updated 9 months ago
- CVPR2025 | TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation☆43Jan 29, 2026Updated 4 months ago
- aFun 编程语言☆12Feb 23, 2022Updated 4 years ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 5 months ago
- Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"☆20Feb 4, 2025Updated last year
- [CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation☆865Mar 19, 2026Updated 2 months ago
- Implementation for "DeltaPhi: Learning Physical Trajectory Residual for PDE Solving"☆13Jun 17, 2024Updated 2 years ago
- This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehe…☆129Jan 29, 2026Updated 4 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆38Jun 2, 2026Updated 2 weeks ago
- Just wanna see what type and how many GPUs/TPUs are used in CVPR 2025 oral papers. Fun vibe coding with LLMs.☆12Apr 24, 2025Updated last year
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆85Jun 13, 2025Updated last year
- ☆57Oct 17, 2021Updated 4 years ago
- Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, from Deepmind☆70Jun 8, 2026Updated last week
- [NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning☆44Apr 13, 2026Updated 2 months ago
- [ACL2026] Uni-MMMU : A Massive Multi-discipline Multimodal Unified Benchmark☆25Apr 13, 2026Updated 2 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆98Mar 9, 2026Updated 3 months ago
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆181Apr 28, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion☆90Dec 27, 2025Updated 5 months ago
- Official implementation of "Repurposing Video Diffusion Transformers for Robust Point Tracking"☆46Dec 24, 2025Updated 5 months ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆135Aug 5, 2025Updated 10 months ago
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆46Dec 2, 2025Updated 6 months ago
- ☆15Mar 18, 2025Updated last year
- ☆71Feb 1, 2026Updated 4 months ago
- This is a collection of recent papers on reasoning in video generation models.☆161Jun 12, 2026Updated last week
- Security-native LLM system for AI-generated application security.☆263Jun 4, 2026Updated 2 weeks ago
- Official implementation of "MV-TAP: Tracking Any Point in Multi-View Videos"☆50Updated this week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [ICML 2025] EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM☆72Jul 16, 2025Updated 11 months ago
- Mirage: One-Step Video Diffusion for Photorealistic and Coherent Asset Editing in Driving Scenes☆29Mar 12, 2026Updated 3 months ago
- ☆87Oct 10, 2025Updated 8 months ago
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆32Oct 5, 2025Updated 8 months ago
- [NIPS 2025] FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens☆21Oct 12, 2025Updated 8 months ago
- The first open-domain closed-loop revisited benchmark for evaluating memory consistency and action control in world models.☆68May 25, 2026Updated 3 weeks ago
- ☆11Nov 21, 2022Updated 3 years ago