callsys/DynRefer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/callsys/DynRefer)

callsys / DynRefer

[CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution

☆59

Alternatives and similar repositories for DynRefer

Users that are interested in DynRefer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

callsys / ControlCap
View on GitHub
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆81Oct 25, 2024Updated last year
mingrui-wu / OSI-Bench
View on GitHub
Official repo of From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
☆24Jun 23, 2026Updated last month
MzeroMiko / XDLM
View on GitHub
[ICML 2026 Spotlight] Code for miXed Discrete Diffusion Language Model
☆27Mar 16, 2026Updated 4 months ago
zhaoyangwei123 / SAPNet
View on GitHub
CVPR2024, Semantic-aware SAM for Point-Prompted Instance Segmentation
☆38Jan 20, 2025Updated last year
callsys / GMPO
View on GitHub
[ICLR 2026] Geometric-Mean Policy Optimization
☆104Jan 26, 2026Updated 5 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
qiujihao19 / LongVideo-R1
View on GitHub
[CVPR 2026] LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
☆50Jul 7, 2026Updated 2 weeks ago
wuw2019 / LoTLIP
View on GitHub
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆49Jan 14, 2025Updated last year
chenxi52 / CMPF
View on GitHub
[IJCV 2026] Official implementation of the paper “CMPF: Harmonizing Cross-Model Prior Fusion for Open-Vocabulary Segmentation”
☆26Jun 15, 2025Updated last year
YWenxi / think-with-images-through-self-calling
View on GitHub
official repo for `thinking with images through-self-calling`
☆26Dec 28, 2025Updated 6 months ago
qiujihao19 / Artemis
View on GitHub
[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
☆27Apr 8, 2025Updated last year
linhuixiao / OneRef
View on GitHub
[NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.
☆32Nov 13, 2025Updated 8 months ago
tkdguraa / WSOL_binarize
View on GitHub
official PyTorch implementation for "Discovering an inference recipe for weakly-supervised object localization"
☆17Aug 3, 2024Updated last year
martian422 / MaskGRPO
View on GitHub
The official implementation of MaskGRPO: Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models. (ICLR 2026, arxiv…
☆19Jan 27, 2026Updated 5 months ago
junha1125 / Vision-Language-Model-in-ECCV-2024
View on GitHub
☆17Oct 1, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mbzuai-oryx / TrackingMeetsLMM
View on GitHub
☆10Apr 7, 2025Updated last year
XiaokunFeng / MemVLT
View on GitHub
[NeurIPS'24] MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts
☆19Oct 7, 2024Updated last year
SivanDoveh / DAC
View on GitHub
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆28Nov 29, 2023Updated 2 years ago
linhuixiao / HiVG
View on GitHub
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
☆65Nov 10, 2025Updated 8 months ago
zeyofu / Commonsense-T2I
View on GitHub
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
☆24Aug 13, 2024Updated last year
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
dahyun-kang / lavg
View on GitHub
[ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
☆51Sep 24, 2024Updated last year
huuuuusy / videocube-toolkit
View on GitHub
The official python toolkit for running experiments and evaluate performance on VideoCube benchmark @TPAMI2023
☆31Apr 1, 2024Updated 2 years ago
cv516Buaa / OV-VG
View on GitHub
☆31Mar 25, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
jacobmarks / fiftyone_florence2_plugin
View on GitHub
Run SOTA Vision-Language Model Florence-2 on your data!
☆15Mar 27, 2025Updated last year
linhuixiao / CLIP-VG
View on GitHub
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
☆135Nov 10, 2025Updated 8 months ago
chengyzhao / TextPSG
View on GitHub
☆19Oct 22, 2023Updated 2 years ago
mightyzau / RegionBLIP
View on GitHub
☆59Aug 7, 2023Updated 2 years ago
Yui010206 / VEGGIE-VidEdit
View on GitHub
[ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
☆34Aug 18, 2025Updated 11 months ago
MzeroMiko / vHeat
View on GitHub
vHeat: Building Vision Models upon Heat Conduction
☆282Jun 12, 2025Updated last year
keviner1 / UAPN
View on GitHub
Official PyTorch implementation of our TGRS paper: Deep Adaptive Pansharpening via Uncertainty-aware Image Fusion.
☆14Aug 7, 2023Updated 2 years ago
ncTimTang / AKS
View on GitHub
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆228Dec 19, 2025Updated 7 months ago
Dmmm1997 / SimVG
View on GitHub
[NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
☆103Oct 29, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
wpy1999 / SAT
View on GitHub
[ICCV2023] PyTorch implementation of ''Spatial-Aware Token for Weakly Supervised Object Localization''.
☆23Oct 24, 2023Updated 2 years ago
MzeroMiko / mamba-mini
View on GitHub
An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…
☆109Oct 14, 2025Updated 9 months ago
WisconsinAIVision / ViP-LLaVA
View on GitHub
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆338Jul 17, 2024Updated 2 years ago
shuheikurita / RefEgo
View on GitHub
☆13Jul 20, 2024Updated 2 years ago
Vision-CAIR / Infinibench
View on GitHub
Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
☆20Nov 4, 2025Updated 8 months ago
cilinyan / ReVOS-api
View on GitHub
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆22Jul 20, 2024Updated 2 years ago
Gen-Verse / HermesFlow
View on GitHub
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆77Sep 19, 2025Updated 10 months ago