thu-coai / VPOLinks
☆11Updated 3 months ago
Alternatives and similar repositories for VPO
Users that are interested in VPO are comparing it to the libraries listed below
Sorting:
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated last month
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆16Updated last week
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆16Updated 2 months ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆15Updated last year
- ☆23Updated last week
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆24Updated last week
- ☆42Updated 8 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆15Updated last month
- Repo of FocusedAD☆13Updated 2 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆31Updated last month
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆16Updated 9 months ago
- The official code of "PixelWorld: Towards Perceiving Everything as Pixels"☆14Updated 5 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆16Updated 5 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated last month
- Quick Long Video Understanding☆58Updated last month
- SFT+RL boosts multimodal reasoning☆19Updated 2 weeks ago
- ☆19Updated last week
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 5 months ago
- Code of our paper "A Unified Agentic Framework for Evaluating Conditional Image Generation".☆25Updated 3 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆18Updated 4 months ago
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆13Updated 4 months ago
- On Path to Multimodal Generalist: General-Level and General-Bench☆17Updated this week
- The official repo of continuous speculative decoding☆27Updated 3 months ago
- Official Repository of Personalized Visual Instruct Tuning☆31Updated 4 months ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆16Updated 2 months ago
- (NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Models☆21Updated 7 months ago
- ☆11Updated 5 months ago
- LMM solved catastrophic forgetting, AAAI2025☆44Updated 3 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆46Updated 6 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆26Updated 2 months ago