YU-deep / ViFLinks
[ICLR 26] Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
☆35Updated 4 months ago
Alternatives and similar repositories for ViF
Users that are interested in ViF are comparing it to the libraries listed below
Sorting:
- ☆66Updated this week
- ☆19Updated 2 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆79Updated 2 months ago
- ☆18Updated 6 months ago
- Training Autoregressive Image Generation models via Reinforcement Learning☆50Updated 2 months ago
- [ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets☆62Updated 6 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆140Updated this week
- ☆13Updated last year
- [NeurIPS 2025 Spotlight] VisualQuality-R1 is the first open-sourced NR-IQA model can accurately describe and rate the image quality.☆153Updated 3 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆108Updated 4 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆34Updated 7 months ago
- CAR: Controllable AutoRegressive Modeling for Visual Generation☆128Updated last year
- [NeurIPS2024]☆36Updated last year
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆50Updated 3 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆156Updated 4 months ago
- Repo for "Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content"☆39Updated 8 months ago
- ☆41Updated last month
- Q-Insight is open-sourced at https://github.com/bytedance/Q-Insight. This repository will not receive further updates.☆142Updated 8 months ago
- [ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding☆66Updated 6 months ago
- List of diffusion related active submissions on OpenReview for ICLR 2025.☆52Updated last year
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆236Updated 5 months ago
- [ICCV 2025 Highlight] LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs☆19Updated 2 months ago
- [ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models☆56Updated last year
- UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation☆121Updated last month
- Official repository for Scone (Subject-driven Composition and Distinction Enhancement) model, designed to support multi-subject compositi…☆28Updated 3 weeks ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆78Updated 4 months ago
- [ACMMM 2025 - Dataset Track] ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies☆22Updated 7 months ago
- Unified Multi-modal IAA Baseline and Benchmark☆92Updated last year
- (NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation☆62Updated 3 months ago
- [ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models☆27Updated last year