Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
☆26Jan 14, 2025Updated last year
Alternatives and similar repositories for PIN
Users that are interested in PIN are comparing it to the libraries listed below
Sorting:
- MCPL: MULTI-CONCEPT PROMPT LEARNING☆20May 27, 2024Updated last year
- [ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"☆33Jan 26, 2026Updated last month
- ☆33Nov 4, 2024Updated last year
- [NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation☆14Oct 7, 2023Updated 2 years ago
- Web-grounded natural language instructions☆18Nov 25, 2024Updated last year
- ☆16Sep 29, 2024Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆43Mar 11, 2025Updated 11 months ago
- Diverse Video Generation using a Gaussian Process Trigger☆18Dec 13, 2022Updated 3 years ago
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆25Apr 14, 2025Updated 10 months ago
- ☆19Dec 6, 2023Updated 2 years ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Mar 28, 2024Updated last year
- [ECCV'22 Poster] Explicit Image Caption Editing☆22Nov 30, 2022Updated 3 years ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆62Dec 10, 2024Updated last year
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Aug 13, 2024Updated last year
- An official PyTorch implementation for CLIPPR☆30Jul 22, 2023Updated 2 years ago
- ☆30Jun 25, 2024Updated last year
- Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)☆186Jul 5, 2024Updated last year
- ☆63Mar 22, 2024Updated last year
- ☆30Mar 2, 2023Updated 3 years ago
- Training code for CLIP-FlanT5☆30Jul 29, 2024Updated last year
- Visual RAG using less than 300 lines of code.☆30Mar 2, 2024Updated last year
- Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(ICCV, 2021) paper☆31Aug 23, 2022Updated 3 years ago
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆28Jul 4, 2023Updated 2 years ago
- Official implementation of the paper The Hidden Language of Diffusion Models☆77Jan 24, 2024Updated 2 years ago
- ☆17Sep 1, 2024Updated last year
- FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models☆32Nov 27, 2025Updated 3 months ago
- Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)☆35May 29, 2024Updated last year
- Official implementation of TagAlign☆35Dec 11, 2024Updated last year
- ☆80Oct 17, 2024Updated last year
- Code for "Interactive Task Planning with Language Models"☆33Jan 12, 2026Updated last month
- ☆37Oct 7, 2023Updated 2 years ago
- This repository contains the registries for components, agents and services, the second part of the autonolas-v1 protocol.☆15Updated this week
- ☆13Apr 27, 2021Updated 4 years ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆45Jun 14, 2024Updated last year
- [NeurIPS 2024] Official implementation of the paper "MambaLRP: Explaining Selective State Space Sequence Models" 🐍☆45Nov 6, 2024Updated last year
- ☆37Sep 16, 2024Updated last year
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆39Mar 4, 2024Updated last year
- [NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".☆38Oct 23, 2024Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆86Oct 26, 2025Updated 4 months ago