Code for Retrieval-Augmented Perception (ICML 2025)
☆71Apr 22, 2026Updated last month
Alternatives and similar repositories for RAP
Users that are interested in RAP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆48Mar 2, 2026Updated 3 months ago
- Code for LLM_Catastrophic_Forgetting via SAM.☆11Jun 7, 2024Updated 2 years ago
- 🚀enhanced GRPO with more verifiable rewards and real-time evaluators☆37Jan 27, 2026Updated 4 months ago
- Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting☆21Mar 25, 2024Updated 2 years ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆90Nov 20, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The official implementation of InfoRM [NeurIPS 2024].☆15Oct 25, 2025Updated 7 months ago
- The code for the paper "Dual Mutual Information Constraints for Discriminative Clustering"☆23Aug 22, 2024Updated last year
- Expression Snippet Transformer for Robust Video-based Facial Expression Recognition☆17Jan 27, 2024Updated 2 years ago
- The first Object-Oriented Programming (OOP) Evaluation Benchmark for LLMs☆27Jan 15, 2025Updated last year
- Official repo for [NeurlPS 2025 Spotlight] "GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution"☆51Oct 27, 2025Updated 7 months ago
- Official repo for ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models☆28Mar 24, 2025Updated last year
- A vision-language model with bidirectional progressive fusion and global-local alignment for enhanced medical image segmentation.☆19Dec 25, 2025Updated 5 months ago
- ☆56May 7, 2026Updated last month
- ☆23Sep 23, 2025Updated 8 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models☆21Apr 30, 2025Updated last year
- (TGRS 2024) OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images☆50Jul 14, 2025Updated 11 months ago
- auto star for repo lists☆10Aug 26, 2023Updated 2 years ago
- The official implementation for Candidate Set Re-ranking for Composed Image Retrieval (TMLR) 01/2024☆20Feb 7, 2024Updated 2 years ago
- ☆16Aug 20, 2024Updated last year
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆143Jun 20, 2024Updated last year
- [ISPRS 2024] LoveNAS: Towards Multi-Scene Land-Cover Mapping via Hierarchical Searching Adaptive Network☆33Dec 1, 2024Updated last year
- Code for WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge☆17Dec 31, 2024Updated last year
- This is the official repository for paper: cross-modal information flow in multimodal large language models☆44May 21, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆31Feb 10, 2025Updated last year
- Creating High-Fidelity Synthetic GPS Trajectory Dataset for Urban Mobility Analysis☆22Mar 12, 2026Updated 3 months ago
- ☆24Jun 18, 2025Updated 11 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆44Jul 2, 2025Updated 11 months ago
- Official code of paper "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis" [ICCV 2025]☆48Jun 29, 2025Updated 11 months ago
- [ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"☆45Feb 27, 2026Updated 3 months ago
- Official Implementation of "IRBridge: Solving Image Restoration Bridge with Pre-trained Generative Diffusion Models"☆18Jun 5, 2025Updated last year
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆377Apr 20, 2025Updated last year
- ☆23Nov 29, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆13Nov 2, 2025Updated 7 months ago
- [ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"☆38Jul 12, 2024Updated last year
- This is the official code of "Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation, NeurIPS 23"☆27Dec 7, 2023Updated 2 years ago
- [CVPR'25] Official code of paper "Mimic In-Context Learning for Multimodal Tasks"☆26May 21, 2026Updated 3 weeks ago
- [ICML 2025 Oral] This is the official repository of the paper "What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensi…☆22Jun 12, 2025Updated last year
- [CVPR 2025] Official Pytorch implementation of "Learning with Noisy Triplet Correspondence for Composed Image Retrieval".☆25Jun 9, 2025Updated last year
- [CVPR 2026] ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks☆40Apr 9, 2026Updated 2 months ago