ZiyuGuo99 / MME-CoFLinks
Are Video Models Ready as Zero-shot Reasoners?
☆60Updated this week
Alternatives and similar repositories for MME-CoF
Users that are interested in MME-CoF are comparing it to the libraries listed below
Sorting:
- [AAAI26] Next Patch Prediction☆131Updated 10 months ago
- Official implementation of X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models☆158Updated 11 months ago
- [ICCV-2025] Official implementation of Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data☆91Updated 3 months ago
- [NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations☆413Updated this week
- [ICLR 2025] BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities☆145Updated 9 months ago
- World Simulator Assistant for Physics-Aware Text-to-Video Generation☆250Updated last month
- ✨✨latest advancements in VLA models(VIsion Language Action)☆93Updated 7 months ago
- Official implementary of HCoG: Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation [CVPR 2025]☆54Updated 3 months ago
- (ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.☆73Updated 4 months ago
- [BMVC 2025] Official implementation for paper EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and…☆105Updated 2 months ago
- [NeurIPS 2025 Spotlight] Towards Understanding Camera Motions in Any Video☆244Updated 3 weeks ago
- Reasoning 3D Segmentation - "segment anything"/grounding/part seperation in 3D with natural conversations.☆85Updated last year
- This is the repository that contains source code for the PhysGen3D.☆228Updated 2 months ago
- [ICCV23] Bird’s-Eye-View Scene Graph for Vision-Language Navigation☆121Updated last year
- A Unified Driving World Model for Future Generation and Perception☆122Updated 3 months ago
- 【 ICLR 2025 】I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength☆113Updated 8 months ago
- Chain-of-Spot: Interactive Reasoning Improves Large Vision-language Models☆98Updated last year
- [NeurIPS 2025 D&B 🔥] OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation☆173Updated last week
- [NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation☆211Updated 5 months ago
- (ICCV2023) Official implementation of 'ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance'…☆59Updated last year
- The official implementation of ACM Multimedia 2024 paper "PlacidDreamer: Advancing Harmony in Text-to-3D Generation".☆107Updated last year
- [3DV 2025]🐱🐶🐲🐮🐷Official Implementation of DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer☆67Updated 7 months ago
- [ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++☆204Updated 3 months ago
- [ACMMM 2025] Officially implement of the paper "DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompti…☆208Updated 6 months ago
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs☆77Updated 3 weeks ago
- ☆140Updated 7 months ago
- [ECCV 2024] Efficient Inference of Vision Instruction-Following Models with Elastic Cache☆42Updated last year
- CoS: Chain-of-Shot Prompting for Long Video Understanding☆52Updated 9 months ago
- WorldGPT: Empowering LLM as Multimodal World Model☆116Updated last year
- Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"☆123Updated 3 weeks ago