PKU-ICST-MIPL/DyFo_CVPR2025

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/PKU-ICST-MIPL/DyFo_CVPR2025)

PKU-ICST-MIPL / DyFo_CVPR2025

☆116

Alternatives and similar repositories for DyFo_CVPR2025

Users that are interested in DyFo_CVPR2025 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆381Apr 20, 2025Updated last year
XLearning-SCU / Reliable_TWI
View on GitHub
Pytorch Implementation of Reliable Thinking with Images.
☆26May 3, 2026Updated 2 months ago
zifuwan / ONLY
View on GitHub
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆51Jul 7, 2025Updated last year
Tennine2077 / HiDe
View on GitHub
[ICML 2026] HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling
☆27May 2, 2026Updated 2 months ago
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,493Mar 9, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
om-ai-lab / ZoomEye
View on GitHub
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆91Nov 20, 2025Updated 8 months ago
PKU-ICST-MIPL / Finedefics_ICLR2025
View on GitHub
☆94Mar 20, 2026Updated 4 months ago
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated 11 months ago
xmed-lab / TAM
View on GitHub
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs
☆189Dec 14, 2025Updated 7 months ago
maifoundations / GCoT
View on GitHub
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
☆15Aug 11, 2025Updated 11 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,250Nov 20, 2025Updated 8 months ago
jcwang0602 / PLVL
View on GitHub
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding
☆13May 9, 2025Updated last year
yu-rp / VisualPerceptionToken
View on GitHub
☆136Mar 22, 2025Updated last year
PhoebusSi / VQA-VS
View on GitHub
Code for our EMNLP-2022 paper: "Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA"
☆40Nov 1, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
PKU-ICST-MIPL / TARA_CVPR2026
View on GitHub
☆17Mar 21, 2026Updated 4 months ago
LzVv123456 / VISTA
View on GitHub
☆86Jul 28, 2025Updated 11 months ago
Pter61 / osrcir
View on GitHub
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]
☆72Jul 8, 2025Updated last year
JIA-Lab-research / VisionReasoner
View on GitHub
[ICLR 2026] VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
☆348Feb 9, 2026Updated 5 months ago
baoqianyue / DFC2021-Track-MSD
View on GitHub
Third place of 2021 IEEE GRSS Data Fusion Contest: Track MSD
☆10Mar 31, 2021Updated 5 years ago
DongSky / MR-GDINO
View on GitHub
☆54Dec 23, 2024Updated last year
yejipark-m / ConVis
View on GitHub
[AAAI 2025] ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Mode…
☆25Sep 26, 2024Updated last year
UCSB-AI / GRIT
View on GitHub
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆191Jan 16, 2026Updated 6 months ago
1zhou-Wang / MemVR
View on GitHub
[ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…
☆171Sep 25, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
whongzhong / MMHalSnowball
View on GitHub
Official resource for paper Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (ACL 20…
☆18Aug 12, 2024Updated last year
ChenAnno / SPIRIT_TOMM2024
View on GitHub
Official implementation for "SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback"
☆16Oct 27, 2025Updated 8 months ago
Hanhpt23 / OmniMod
View on GitHub
MCOUT: Multimodal Chain of Continuous Thought for Latent Reasoning
☆21Oct 4, 2025Updated 9 months ago
DreamMr / HR-Bench
View on GitHub
PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…
☆49Mar 2, 2026Updated 4 months ago
KeNiu042 / Diffusion-ReID
View on GitHub
Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training
☆11Jan 23, 2024Updated 2 years ago
LiuYuML / NT-VOT211
View on GitHub
[ACCV 2024 (Oral, Best Application Paper)] Official Implementation of NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tra…
☆16Dec 30, 2025Updated 6 months ago
JIA-Lab-research / Seg-Zero
View on GitHub
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
☆635Jan 17, 2026Updated 6 months ago
ChenAnno / FashionERN_AAAI2024
View on GitHub
Official implementation for "FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval"
☆20Oct 27, 2025Updated 8 months ago
ChantalMP / RaDialog_v2
View on GitHub
LLaVa Version of RaDialog
☆26May 27, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mrwu-mac / ControlMLLM
View on GitHub
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆210Jul 17, 2025Updated last year
seilk / VisAttnSink
View on GitHub
[ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models
☆116Feb 16, 2025Updated last year
PolyU-ChenLab / UniPixel
View on GitHub
🔮 UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning (NeurIPS 2025)
☆247Jan 4, 2026Updated 6 months ago
ghchen18 / acl23_mclip
View on GitHub
The official code and model for ACL 2023 paper 'mCLIP: Multilingual CLIP via Cross-lingual Transfer'
☆10Jan 23, 2024Updated 2 years ago
RenlyH / CodeV
View on GitHub
[CVPR 2026 Oral] Code with Image
☆31Dec 5, 2025Updated 7 months ago
ligeng0197 / Awesome-Thinking-With-Images
View on GitHub
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆113Aug 21, 2025Updated 11 months ago
penghao-wu / vstar
View on GitHub
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆707Jan 7, 2024Updated 2 years ago