DreamMr/RAP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DreamMr/RAP)

DreamMr / RAP

Code for Retrieval-Augmented Perception （ICML 2025)

☆74

Alternatives and similar repositories for RAP

Users that are interested in RAP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DreamMr / HR-Bench
View on GitHub
PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…
☆51Mar 2, 2026Updated 4 months ago
om-ai-lab / ZoomEye
View on GitHub
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆91Nov 20, 2025Updated 8 months ago
DreamMr / TranX-Adapter
View on GitHub
Code for TranX-Adapter (ICML 2026)
☆16Jun 3, 2026Updated last month
alphadl / R1
View on GitHub
🚀enhanced GRPO with more verifiable rewards and real-time evaluators
☆37Jan 27, 2026Updated 5 months ago
kiki-zyq / ZoomSearch
View on GitHub
Official code for the paper “Look Where It Matters: Training-Free Ultra-HR Remote Sensing VQA via Adaptive Zoom Search”.
☆27Dec 8, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
miaoyuchun / InfoRM
View on GitHub
The official implementation of InfoRM [NeurIPS 2024].
☆16Oct 25, 2025Updated 9 months ago
alphadl / 3d-gen-for-llm-builders
View on GitHub
A hands-on guide to 3D latent diffusion for LLM/VLM builders
☆27Apr 7, 2026Updated 3 months ago
xavier-yu114 / Zoom-Refine
View on GitHub
Zoom-Refine: Boosting High-Resolution Multimodal Understanding via Localized Zoom and Self-Refinement
☆19Jul 4, 2026Updated 3 weeks ago
Li-Hyn / DMICC
View on GitHub
The code for the paper "Dual Mutual Information Constraints for Discriminative Clustering"
☆23Aug 22, 2024Updated last year
saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆381Apr 20, 2025Updated last year
lzz11834 / SGIDN
View on GitHub
Zhong, Y., W. Li, X. Wang, S. Jin and L. Zhang. "Satellite-ground integrated destriping network: A new perspective for EO-1 Hyperion and …
☆14Dec 3, 2021Updated 4 years ago
Beckschen / spatialcode
View on GitHub
Open studio for "Thinking with Spatial Code" (https://arxiv.org/pdf/2603.05591)
☆20Mar 18, 2026Updated 4 months ago
weihao-bo / ViLoMem
View on GitHub
ViLoMem: Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
☆66Apr 21, 2026Updated 3 months ago
NVlabs / PS3
View on GitHub
Scaling Vision Pre-Training to 4K Resolution
☆225Jan 4, 2026Updated 6 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
muirbench / MuirBench
View on GitHub
A Comprehensive Benchmark for Robust Multi-image Understanding
☆21Sep 4, 2024Updated last year
alibaba-multimodal-industrial-ai / IndustryBench
View on GitHub
A multi-lingual benchmark for evaluating industrial domain knowledge of LLMs.
☆155Jun 15, 2026Updated last month
om-ai-lab / ImageRAG
View on GitHub
Enhancing Ultrahigh Resolution Remote Sensing Imagery Analysis With ImageRAG [GRSM]
☆34May 16, 2026Updated 2 months ago
UCSB-AI / DMLR
View on GitHub
[CVPR2026] Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
☆84May 12, 2026Updated 2 months ago
multimodal-art-projection / IV-Bench
View on GitHub
☆14Apr 23, 2025Updated last year
zwq2018 / Auto_star
View on GitHub
auto star for repo lists
☆10Aug 26, 2023Updated 2 years ago
pengfei-luo / ImageScope
View on GitHub
[WWW 2025 Oral] ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
☆21Jul 2, 2025Updated last year
HJYao00 / Awesome-Agentic-MLLMs
View on GitHub
Agentic MLLMs
☆216Oct 24, 2025Updated 9 months ago
Cuberick-Orion / Candidate-Reranking-CIR
View on GitHub
The official implementation for Candidate Set Re-ranking for Composed Image Retrieval (TMLR) 01/2024
☆20Feb 7, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chancharikmitra / CCoT
View on GitHub
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Jun 20, 2024Updated 2 years ago
subrata-samanta / RL-Self-Improving-RAG
View on GitHub
This project implements a Reinforcement Learning (RL) enhanced Retrieval-Augmented Generation (RAG) system that optimizes document retrie…
☆25Apr 27, 2025Updated last year
showlab / VideoLISA
View on GitHub
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆148Dec 26, 2024Updated last year
wang-qiuchen / PseDet
View on GitHub
[ICLR 2025] PseDet: Revisiting the Power of Pseudo Label in Incremental Object Detection
☆23Sep 16, 2025Updated 10 months ago
GuangyanS / Sys2-LLaVA
View on GitHub
☆31Feb 10, 2025Updated last year
OmniMMI / OpenOmniNexus
View on GitHub
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
☆38Apr 7, 2025Updated last year
PKU-YuanGroup / Look-Back
View on GitHub
This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".
☆100Jul 10, 2025Updated last year
AHideoKuzeA / Evol-SAM3
View on GitHub
☆47Jan 1, 2026Updated 6 months ago
EvolvingLMMs-Lab / MGPO
View on GitHub
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆55Jul 23, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
DreamMr / WisdoM
View on GitHub
Code for WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge
☆17Dec 31, 2024Updated last year
MediaBrain-SJTU / LoRKD
View on GitHub
☆25Nov 8, 2024Updated last year
Tennine2077 / HiDe
View on GitHub
[ICML 2026] HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling
☆27May 2, 2026Updated 2 months ago
PKU-ICST-MIPL / DyFo_CVPR2025
View on GitHub
☆116Aug 14, 2025Updated 11 months ago
MzeroMiko / XDLM
View on GitHub
[ICML 2026 Spotlight] Code for miXed Discrete Diffusion Language Model
☆27Mar 16, 2026Updated 4 months ago
HKUST-LongGroup / CFA
View on GitHub
[ICCV 2023] Compositional Feature Augmentation for Unbiased Scene Graph Generation
☆15Dec 5, 2023Updated 2 years ago
lerogo / aaai24_itr_cusa
View on GitHub
Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"
☆55Mar 28, 2024Updated 2 years ago