taco-group / MapBench
☆20Updated 3 weeks ago
Alternatives and similar repositories for MapBench:
Users that are interested in MapBench are comparing it to the libraries listed below
- ☆13Updated last week
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆18Updated last month
- LEO: A powerful Hybrid Multimodal LLM☆17Updated 2 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆25Updated 3 weeks ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆37Updated 4 months ago
- ☆59Updated this week
- Official Implementation of DINO-Foresight: Looking into the Future with DINO☆49Updated last month
- [ICML 2024] GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Mode☆49Updated 4 months ago
- [CVPR 2025] Few-shot Recognition via Stage-Wise Retrieval-Augmented Finetuning☆14Updated last week
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆37Updated 6 months ago
- Code for CVPR2025 "MMRL: Multi-Modal Representation Learning for Vision-Language Models".☆25Updated 3 weeks ago
- ☆12Updated 4 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 9 months ago
- [ICLR 2025] Code for Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models☆15Updated last month
- ☆40Updated 5 months ago
- Official Repository of Personalized Visual Instruct Tuning☆28Updated last month
- ☆17Updated last month
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆32Updated 2 weeks ago
- [CVPR2025] Official code repository for SeTa: "Scale Efficient Training for Large Datasets"☆13Updated 3 weeks ago
- ☆35Updated last month
- Source code for "To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation", ICCV 2023☆48Updated 9 months ago
- Scaffold Prompting to promote LMMs☆39Updated 3 months ago
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆24Updated 5 months ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆18Updated 3 months ago
- Open-Vocabulary Panoptic Segmentation☆23Updated 7 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆29Updated 3 weeks ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding☆21Updated 2 months ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆20Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆17Updated 5 months ago
- Official Implementation of DiffCLIP: Differential Attention Meets CLIP☆24Updated 3 weeks ago