GeWu-Lab/Patch-Matters

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GeWu-Lab/Patch-Matters)

GeWu-Lab / Patch-Matters

[CVPR2025] Code Release of Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception

☆25

Alternatives and similar repositories for Patch-Matters

Users that are interested in Patch-Matters are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

daeunni / Video-Skill-CoT
View on GitHub
Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"
☆18Aug 27, 2025Updated 10 months ago
jiazhen-code / PhD
View on GitHub
[CVPR25 Highlight] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced eval…
☆32Apr 16, 2025Updated last year
xzxxntxdy / PEPO
View on GitHub
Official repo for ”Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought“
☆26Mar 29, 2026Updated 3 months ago
WayneTomas / TransCP
View on GitHub
[TPAMI 2024] This is the official Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding"…
☆28May 8, 2025Updated last year
penghao-wu / visual_jigsaw
View on GitHub
☆78Apr 9, 2026Updated 3 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆380Apr 20, 2025Updated last year
HVision-NKU / ASID-Caption
View on GitHub
ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…
☆67Mar 3, 2026Updated 4 months ago
hanlinwu / ChangeChat
View on GitHub
AN INTERACTIVE REMOTE SENSING CHANGE ANALYSIS MODEL BASED ON MULTIMODAL INSTRUCTION TUNING
☆24Jun 16, 2025Updated last year
MAGAer13 / DeCapBench
View on GitHub
Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)
☆14Mar 6, 2025Updated last year
mims-harvard / Qworld
View on GitHub
Qworld: Question-Specific Evaluation Criteria for LLMs
☆30Mar 26, 2026Updated 3 months ago
Sphere-AI-Lab / PEFT-Arena
View on GitHub
Official repository of PEFT-Arena: Understanding Parameter-Efficient Finetuning from a Stability-Plasticity Perspective
☆26Jun 13, 2026Updated last month
StarDewXXX / AdaR1
View on GitHub
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆24May 6, 2026Updated 2 months ago
marinero4972 / CyberV
View on GitHub
☆20Jun 10, 2025Updated last year
mace-cream / clusterhowto
View on GitHub
☆10Jan 19, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ChengHan111 / VPT-or-FT
View on GitHub
Official Pytorch implementation of 'Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning'? (ICLR2024)
☆13Mar 8, 2024Updated 2 years ago
AVoCaDO-Captioner / AVoCaDO
View on GitHub
https://avocado-captioner.github.io/
☆37Oct 16, 2025Updated 9 months ago
DYEvaLab / EvalMuse-Structure
View on GitHub
☆18Feb 12, 2025Updated last year
HVision-NKU / ControlSR
View on GitHub
☆13Apr 19, 2025Updated last year
mobiushy / move-act
View on GitHub
☆11Jul 26, 2024Updated last year
IntMeGroup / MINT-IQA
View on GitHub
[TMM] MINT-IQA: Quality Assessment for AI Generated Images with Instruction Tuning
☆21Nov 21, 2025Updated 8 months ago
YuqiZhang-Buaa / Mamba2MIL
View on GitHub
☆11Sep 30, 2024Updated last year
yeerwen / Awesome-Medical-Efficient-Fine-Tuning
View on GitHub
☆35Mar 25, 2025Updated last year
ArtemisWang / blind_movies
View on GitHub
为视障人群生成电影，输入是电影剧本和mkv格式电影，输出为带有解说的电影
☆12Jul 28, 2019Updated 6 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
yhydhx / SAMAug
View on GitHub
Visual Prompt Augmentation
☆40Dec 21, 2023Updated 2 years ago
Meize0729 / CCExpert
View on GitHub
This is the pytorch implement of our paper "CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware…
☆40Jun 29, 2026Updated 3 weeks ago
SimarKareer / UnifiedVideoDA
View on GitHub
We're Not Using Videos Effectively (TMLR 2024)
☆17Feb 4, 2024Updated 2 years ago
nailwatts / FNIN
View on GitHub
FNIN: A Fourier Neural Operator-based Numerical Integration Network for Surface-form-gradients
☆13Jan 22, 2025Updated last year
fujiso / SODA
View on GitHub
SODA: Story Oriented Dense Video Captioning Evaluation Framework
☆14May 3, 2024Updated 2 years ago
2-mo / Awesome-Thinking-with-VAD
View on GitHub
☆16May 26, 2026Updated last month
stallone0000 / Reasoning-Skill
View on GitHub
☆20May 25, 2026Updated last month
adxcreative / D-M
View on GitHub
The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchin…
☆10Feb 9, 2025Updated last year
msm8976 / NightReID
View on GitHub
[AAAI'25 Oral] NightReID: A Large-Scale Nighttime Person Re-Identification Benchmark
☆11Jun 10, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
dengandong / GroundMoRe
View on GitHub
☆18May 18, 2026Updated 2 months ago
sangminwoo / AvisC
View on GitHub
[ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vis…
☆25Jul 21, 2024Updated last year
forwchen / mfcc_boaw
View on GitHub
Extract MFCCs from videos and make bag-of-audio-words (BOAW) representations.
☆11Dec 20, 2018Updated 7 years ago
ryylcc / OWSOL
View on GitHub
☆15Feb 18, 2024Updated 2 years ago
botbahlul / Live-Subtitle-V2
View on GitHub
ANDROID APP that can RECOGNIZE VLC LIVE AUDIO/VIDEO STREAMING (using free Android Developers Speech Recognition API) then TRANSLATE (usin…
☆14May 5, 2024Updated 2 years ago
djene-mengistu / Awesome-Machine-Vision-and-Anomaly-Detection
View on GitHub
This repo contains state-of-the-art deep learning models for industrial anomaly detection, defect segmentation, detection, and classifica…
☆15Apr 18, 2026Updated 3 months ago
Rainbowman0 / TML_LLIE
View on GitHub
code for "Troublemaker Learning for Low-Light Image Enhancement"
☆21Mar 14, 2024Updated 2 years ago