Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples
☆40Nov 27, 2024Updated last year
Alternatives and similar repositories for IPLoc
Users that are interested in IPLoc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Dec 20, 2024Updated last year
- ☆12Apr 18, 2025Updated 11 months ago
- [MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology☆12Jun 17, 2025Updated 9 months ago
- [CVPRW 2025] Official repository of paper titled "Towards Evaluating the Robustness of Visual State Space Models"☆26Jun 8, 2025Updated 9 months ago
- ☆11Oct 29, 2024Updated last year
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 9 months ago
- Code for our paper: "Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval".☆15Feb 26, 2025Updated last year
- [ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".☆12Oct 11, 2024Updated last year
- [CVPR 2023] Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection☆30Jun 21, 2023Updated 2 years ago
- ☆10Nov 12, 2024Updated last year
- Code of paper "A Video Dataset for Falling Object Detection around Buildings" https://arxiv.org/abs/2408.05750☆18Jul 10, 2025Updated 8 months ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated 2 years ago
- EventHallusion: Diagnosing Event Hallucinations in Video LLMs☆34Aug 5, 2025Updated 7 months ago
- [EMNLP 2024] SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information☆12Oct 11, 2024Updated last year
- Validating image classification benchmark results on ViTs and ResNets (v2)☆13Nov 3, 2022Updated 3 years ago
- 3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers☆14Apr 17, 2023Updated 2 years ago
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆186Jul 5, 2024Updated last year
- ☆47Nov 7, 2024Updated last year
- Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown☆41Feb 22, 2026Updated last month
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆140Aug 21, 2025Updated 7 months ago
- ☆21Oct 10, 2023Updated 2 years ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Apr 17, 2024Updated last year
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated 11 months ago
- LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections (NeurIPS 2023)☆29Dec 27, 2023Updated 2 years ago
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"☆14Sep 28, 2024Updated last year
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆19Jul 21, 2024Updated last year
- ☆23Aug 19, 2024Updated last year
- QT-DOG: QUANTIZATION-AWARE TRAINING FOR DOMAIN GENERALIZATION☆24Nov 30, 2025Updated 3 months ago
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆50Aug 23, 2024Updated last year
- [NAACL'25] Contains code and documentation for our VANE-Bench paper.☆23Aug 19, 2025Updated 7 months ago
- Official Implementation of the paper "DifFSS: Diffusion Model for Few-Shot Semantic Segmentation"☆14Jul 26, 2023Updated 2 years ago
- ☆35Feb 5, 2024Updated 2 years ago
- Official PyTorch Implementation of MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation(CVPR …☆30Mar 15, 2024Updated 2 years ago
- Extended Few-Shot Learning: Exploiting Existing Resources for Novel Tasks☆10Jul 6, 2021Updated 4 years ago
- Unified-modal Salient Object Detection via Adaptive Prompt Learning☆12Oct 17, 2025Updated 5 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69May 31, 2024Updated last year
- A dataset of scientific vector graphics in TikZ for training generative models.☆25Feb 4, 2026Updated last month
- 🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant (NeurIPS 2024)☆121Mar 26, 2025Updated 11 months ago
- ☆11Aug 20, 2025Updated 7 months ago