Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples
☆40Nov 27, 2024Updated last year
Alternatives and similar repositories for IPLoc
Users that are interested in IPLoc are comparing it to the libraries listed below
Sorting:
- [CVPRW 2025] Official repository of paper titled "Towards Evaluating the Robustness of Visual State Space Models"☆26Jun 8, 2025Updated 8 months ago
- ☆12Dec 20, 2024Updated last year
- ☆12Apr 18, 2025Updated 10 months ago
- [MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology☆12Jun 17, 2025Updated 8 months ago
- ☆11Oct 29, 2024Updated last year
- Code for our paper: "Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval".☆15Feb 26, 2025Updated last year
- Code of paper "A Video Dataset for Falling Object Detection around Buildings" https://arxiv.org/abs/2408.05750☆17Jul 10, 2025Updated 7 months ago
- [EMNLP 2024] SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information☆12Oct 11, 2024Updated last year
- EventHallusion: Diagnosing Event Hallucinations in Video LLMs☆34Aug 5, 2025Updated 6 months ago
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated 11 months ago
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"☆14Sep 28, 2024Updated last year
- [ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".☆12Oct 11, 2024Updated last year
- Validating image classification benchmark results on ViTs and ResNets (v2)☆13Nov 3, 2022Updated 3 years ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated last year
- 3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers☆14Apr 17, 2023Updated 2 years ago
- [NAACL'25] Contains code and documentation for our VANE-Bench paper.☆17Aug 19, 2025Updated 6 months ago
- ☆23Feb 4, 2026Updated 3 weeks ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆19Jul 21, 2024Updated last year
- (ICCV 2023) Generative Multiplane Neural Radiance for 3D Aware Image Generation.☆19Sep 28, 2023Updated 2 years ago
- QT-DOG: QUANTIZATION-AWARE TRAINING FOR DOMAIN GENERALIZATION☆23Nov 30, 2025Updated 3 months ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 9 months ago
- ☆21Oct 10, 2023Updated 2 years ago
- ☆47Nov 7, 2024Updated last year
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆138Aug 21, 2025Updated 6 months ago
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆50Aug 23, 2024Updated last year
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- PyTorch code for the paper: "Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation"☆19Aug 5, 2021Updated 4 years ago
- [⭐ CVPR 2025 Highlight ⭐] Official Implementation of the paper STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing fro…☆29Apr 22, 2025Updated 10 months ago
- A Coarse-to-Fine Pseudo-Labeling (C2FPL) Framework for Unsupervised Video Anomaly Detection☆21May 18, 2024Updated last year
- ☆23Mar 25, 2025Updated 11 months ago
- LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)☆29Jul 23, 2024Updated last year
- [BMVC 2025] Official Implementation of the paper "PerSense: Personalized Instance Segmentation in Dense Images"☆28Dec 18, 2025Updated 2 months ago
- 🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant (NeurIPS 2024)☆118Mar 26, 2025Updated 11 months ago
- Recent Advances in Visual Dialog☆30Aug 19, 2022Updated 3 years ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69May 31, 2024Updated last year
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆77Nov 20, 2025Updated 3 months ago
- Code for "Don't trust your eyes: on the (un)reliability of feature visualizations" (ICML 2024)☆34Nov 15, 2023Updated 2 years ago
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆28Jul 4, 2023Updated 2 years ago
- [ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination☆13Apr 29, 2025Updated 10 months ago