wangyuchi369 / RICOLinks
Official implementation of the paper: RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
☆17Updated 3 months ago
Alternatives and similar repositories for RICO
Users that are interested in RICO are comparing it to the libraries listed below
Sorting:
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆56Updated 2 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆128Updated 3 months ago
- ☆126Updated 3 months ago
- ✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆75Updated 2 months ago
- Official implement of MIA-DPO☆66Updated 8 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆74Updated 2 months ago
- ☆75Updated 3 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆206Updated last week
- Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning☆163Updated this week
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆164Updated 4 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆86Updated last year
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆148Updated last month
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆210Updated 2 weeks ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆56Updated 2 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆83Updated 4 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆130Updated 8 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆134Updated this week
- [CVPR2025] Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think☆24Updated 2 months ago
- Structured Video Comprehension of Real-World Shorts☆193Updated this week
- Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"☆268Updated last week
- ICML2025☆57Updated 3 weeks ago
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆141Updated 3 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆215Updated last month
- The code repository of UniRL☆40Updated 3 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆287Updated 4 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆69Updated 11 months ago
- A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generation☆79Updated last month
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆95Updated 2 weeks ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆117Updated last month
- ☆56Updated 2 weeks ago