dzh19990407 / PPMNLinks
ACM MM 2022 - PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding
☆10Updated 2 years ago
Alternatives and similar repositories for PPMN
Users that are interested in PPMN are comparing it to the libraries listed below
Sorting:
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- ☆31Updated 9 months ago
- ☆20Updated 9 months ago
- [CVPR'24] Code for Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models☆17Updated last year
- The benchmark for "Video Object Segmentation in Panoptic Wild Scenes".☆12Updated last year
- ☆58Updated last year
- Sambor: Boosting Segment Anything Model Towards Open-Vocabulary Learning☆30Updated last year
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆26Updated 10 months ago
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation☆47Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆77Updated 8 months ago
- ☆32Updated last year
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆81Updated last month
- ☆17Updated 8 months ago
- [TCSVT 2024] Temporally Consistent Referring Video Object Segmentation with Hybrid Memory☆17Updated 3 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆37Updated last year
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆44Updated 6 months ago
- Disentangled Pre-training for Human-Object Interaction Detection☆25Updated last month
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆29Updated last year
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆67Updated 9 months ago
- Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models☆17Updated last month
- Open-vocabulary Semantic Segmentation☆33Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆98Updated last month
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference☆86Updated 3 months ago
- CVPR2022 - Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation☆23Updated 2 years ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆60Updated this week
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆56Updated 8 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆41Updated 4 months ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆38Updated 2 months ago
- Code for the paper "Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundatio…☆28Updated last year