AMAP-ML / RealQA
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model.
☆42Updated last month
Alternatives and similar repositories for RealQA:
Users that are interested in RealQA are comparing it to the libraries listed below
- VMBench: A Benchmark for Perception-Aligned Video Motion Generation☆45Updated last month
- ☆17Updated last month
- USP: Unified Self-Supervised Pretraining for Image Generation and Understanding☆62Updated 2 weeks ago
- GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning☆119Updated this week
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈☆43Updated 3 weeks ago
- The Next Step Forward in Multimodal LLM Alignment☆149Updated last week
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward☆34Updated last month
- ☆79Updated last month
- Official implementation of Unified Reward Model for Multimodal Understanding and Generation.☆243Updated this week
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆78Updated last month
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆72Updated this week
- Empowering Unified MLLM with Multi-granular Visual Generation☆119Updated 3 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆81Updated last month
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆47Updated 2 months ago
- ☆19Updated 2 weeks ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆114Updated 2 months ago
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning☆115Updated 2 weeks ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆75Updated 3 weeks ago
- Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"☆197Updated 2 weeks ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆318Updated 2 months ago
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆106Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.☆104Updated 2 weeks ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆25Updated 3 weeks ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆55Updated last month
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆93Updated 3 months ago
- Official repository of MMDU dataset☆89Updated 7 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆116Updated 4 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆48Updated last month
- Official Implementation of OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation☆17Updated last month
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆84Updated 8 months ago