yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆60Updated 3 months ago
Alternatives and similar repositories for apiprompting:
Users that are interested in apiprompting are comparing it to the libraries listed below
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆66Updated 3 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆56Updated 4 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆44Updated last month
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆34Updated last month
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆31Updated 10 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆61Updated 5 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆62Updated 7 months ago
- Official Repository of Personalized Visual Instruct Tuning☆26Updated 2 months ago
- [ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation☆21Updated last week
- FreeVA: Offline MLLM as Training-Free Video Assistant☆54Updated 7 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆51Updated 5 months ago
- Task Residual for Tuning Vision-Language Models (CVPR 2023)☆68Updated last year
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆77Updated 10 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆33Updated last month
- ☆37Updated 2 months ago
- Adapting LLaMA Decoder to Vision Transformer☆26Updated 8 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆65Updated 3 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆51Updated 3 weeks ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆43Updated 6 months ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆95Updated 2 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆122Updated 2 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆62Updated 7 months ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆35Updated 3 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆137Updated last week
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆72Updated 4 months ago
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆44Updated 5 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆25Updated 8 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆32Updated 7 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆95Updated last month
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆42Updated 6 months ago