AMAP-ML / VMBench-Website
☆17Updated last month
Alternatives and similar repositories for VMBench-Website:
Users that are interested in VMBench-Website are comparing it to the libraries listed below
- Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model.☆42Updated last month
- VMBench: A Benchmark for Perception-Aligned Video Motion Generation☆45Updated last month
- USP: Unified Self-Supervised Pretraining for Image Generation and Understanding☆62Updated 2 weeks ago
- GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning☆119Updated this week
- Collections of Papers and Projects for Multimodal Reasoning.☆104Updated 2 weeks ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆333Updated 2 months ago
- This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehens…☆69Updated last week
- The Next Step Forward in Multimodal LLM Alignment☆153Updated last week
- Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"☆16Updated last year
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆273Updated 7 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆185Updated 2 weeks ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆164Updated 3 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆307Updated 4 months ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆20Updated 2 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆58Updated 2 weeks ago
- ☆117Updated 2 months ago
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward☆34Updated last month
- A jounery to real multimodel R1 ! We are doing on large-scale experiment☆298Updated 2 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".☆102Updated last week
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning☆115Updated 2 weeks ago
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆40Updated last week
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆89Updated 5 months ago
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆79Updated last month
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆93Updated 3 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆59Updated last month
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆27Updated this week
- Code for "UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning"☆91Updated this week
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆61Updated last week
- p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay☆35Updated 4 months ago
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆740Updated this week