DreamMr/HR-Bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DreamMr/HR-Bench)

DreamMr / HR-Bench

PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models"

☆51

Alternatives and similar repositories for HR-Bench

Users that are interested in HR-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

om-ai-lab / ZoomEye
View on GitHub
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆91Nov 20, 2025Updated 8 months ago
xavier-yu114 / Zoom-Refine
View on GitHub
Zoom-Refine: Boosting High-Resolution Multimodal Understanding via Localized Zoom and Self-Refinement
☆19Jul 4, 2026Updated 3 weeks ago
DreamMr / RAP
View on GitHub
Code for Retrieval-Augmented Perception （ICML 2025)
☆74Apr 22, 2026Updated 3 months ago
DreamMr / TranX-Adapter
View on GitHub
Code for TranX-Adapter (ICML 2026)
☆16Jun 3, 2026Updated last month
Tennine2077 / HiDe
View on GitHub
[ICML 2026] HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling
☆27May 2, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / PixelCraft
View on GitHub
[ICLR 2026] High-Fidelity Visual Reasoning on Structured Images
☆30Jul 17, 2026Updated last week
bzluan / TextCoT
View on GitHub
[ACM TOMM] Official implementation of "TextCoT: Zoom-In for Enhanced Multimodal Text-Rich Image Understanding"
☆45Feb 27, 2026Updated 4 months ago
saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆381Apr 20, 2025Updated last year
DreamMr / EST
View on GitHub
Expression Snippet Transformer for Robust Video-based Facial Expression Recognition
☆17Jan 27, 2024Updated 2 years ago
alphadl / SafeLLM_with_IntentionAnalysis
View on GitHub
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆21Mar 25, 2024Updated 2 years ago
jiazhen-code / PhD
View on GitHub
[CVPR25 Highlight] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced eval…
☆32Apr 16, 2025Updated last year
penghao-wu / vstar
View on GitHub
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆707Jan 7, 2024Updated 2 years ago
THUNLP-MT / Brote
View on GitHub
☆11Jan 19, 2025Updated last year
luckybird1994 / IPSeg
View on GitHub
Towards Training-free Open-world Segmentation via Image Prompt Foundation Models,
☆18Nov 22, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Sreyan88 / VDGD
View on GitHub
Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
☆25May 7, 2025Updated last year
Haoyu-ha / LNLN
View on GitHub
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
☆119Feb 24, 2026Updated 5 months ago
Haochen-Wang409 / TreeVGR
View on GitHub
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
☆92Jan 26, 2026Updated 6 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,493Mar 9, 2026Updated 4 months ago
AFeng-x / Draw-and-Understand
View on GitHub
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆94Dec 1, 2025Updated 7 months ago
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆93Jun 17, 2024Updated 2 years ago
DearCaat / E2E-WSI-ABMILX
View on GitHub
[NeurIPS 2025] Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
☆35Oct 20, 2025Updated 9 months ago
MLRM-Halu / MLRM-Halu
View on GitHub
[NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
☆82May 31, 2025Updated last year
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
PKU-ICST-MIPL / DyFo_CVPR2025
View on GitHub
☆116Aug 14, 2025Updated 11 months ago
OmniMMI / OpenOmniNexus
View on GitHub
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
☆38Apr 7, 2025Updated last year
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
Hanhpt23 / OmniMod
View on GitHub
MCOUT: Multimodal Chain of Continuous Thought for Latent Reasoning
☆21Oct 4, 2025Updated 9 months ago
miaoyuchun / InfoRM
View on GitHub
The official implementation of InfoRM [NeurIPS 2024].
☆16Oct 25, 2025Updated 9 months ago
Visual-Agent / DeepEyesV2
View on GitHub
☆624Feb 26, 2026Updated 5 months ago
ShenzheZhu / JailDAM
View on GitHub
[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
☆26Nov 25, 2025Updated 8 months ago
CYWang735 / AdaTooler-V
View on GitHub
☆71Feb 27, 2026Updated 4 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,251Nov 20, 2025Updated 8 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
UMass-Embodied-AGI / FlexAttention
View on GitHub
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆49Jan 8, 2025Updated last year
kai422 / SCALE
View on GitHub
[ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.
☆15Mar 12, 2024Updated 2 years ago
Niujunbo2002 / NativeRes-LLaVA
View on GitHub
Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"
☆55Jun 17, 2025Updated last year
seilk / VisAttnSink
View on GitHub
[ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models
☆116Feb 16, 2025Updated last year
YuHengsss / Q-Zoom
View on GitHub
☆15Apr 15, 2026Updated 3 months ago
kevinyaobytedance / llm_eval
View on GitHub
LLM evaluation.
☆16Nov 7, 2023Updated 2 years ago
ltpo2025 / LTPO
View on GitHub
[ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
☆32Mar 6, 2026Updated 4 months ago