AMAP-ML / NarrLVLinks
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models
β109Updated 2 weeks ago
Alternatives and similar repositories for NarrLV
Users that are interested in NarrLV are comparing it to the libraries listed below
Sorting:
- [ICCV 25] VMBench: A Benchmark for Perception-Aligned Video Motion Generationβ59Updated last week
- π This is a repository for organizing papers, codes, and other resources related to unified multimodal models.β271Updated last week
- Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model.β62Updated last month
- π₯CVPR 2025 Multimodal Large Language Models Paper Listβ149Updated 5 months ago
- Official repository for VisionZip (CVPR 2025)β332Updated 3 weeks ago
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β366Updated this week
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generationβ143Updated last week
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Mangaβ112Updated 4 months ago
- Collections of Papers and Projects for Multimodal Reasoning.β105Updated 3 months ago
- A list of works on evaluation of visual generation models, including evaluation metrics, models, and systemsβ343Updated last week
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β661Updated last week
- Official implementation of UnifiedReward & UnifiedReward-Thinkβ497Updated last week
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editingβ79Updated 3 weeks ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"β272Updated 3 months ago
- [TMLR 2025π₯] A survey for the autoregressive models in vision.β673Updated this week
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'β243Updated 3 months ago
- [ICCV25] USP: Unified Self-Supervised Pretraining for Image Generation and Understandingβ85Updated last month
- β18Updated 4 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought β¦β360Updated 7 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generationβ117Updated 9 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β187Updated 3 weeks ago
- R1-like Video-LLM for Temporal Groundingβ110Updated last month
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Keyβ69Updated 2 months ago
- Empowering Unified MLLM with Multi-granular Visual Generationβ129Updated 6 months ago
- β99Updated 4 months ago
- β13Updated 3 months ago
- β134Updated 6 months ago
- [NeurIPS 2024] Visual Perception by Large Language Modelβs Weightsβ45Updated 4 months ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".β34Updated last month
- [CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpuβ¦β85Updated 3 months ago