jaisidhsingh / CoN-CLIP
Implementation of the "Learn No to Say Yes Better" paper.
☆31Updated this week
Alternatives and similar repositories for CoN-CLIP:
Users that are interested in CoN-CLIP are comparing it to the libraries listed below
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆129Updated 4 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆34Updated last year
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆114Updated 5 months ago
- ☆65Updated 9 months ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆71Updated 6 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆79Updated 6 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆71Updated 3 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆162Updated 3 months ago
- A collection of visual instruction tuning datasets.☆76Updated last year
- ☆115Updated 8 months ago
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆61Updated last week
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆154Updated last week
- ☆42Updated 3 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆154Updated 7 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆126Updated 11 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆56Updated this week
- Visual Instruction Tuning for Qwen2 Base Model☆32Updated 9 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆43Updated 3 months ago
- ☆73Updated 3 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆84Updated 6 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆92Updated this week
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆28Updated 6 months ago
- ☆80Updated last month
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆100Updated 3 weeks ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆71Updated 10 months ago
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward☆31Updated last month
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆111Updated 3 weeks ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆53Updated 9 months ago
- [CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"☆33Updated last week
- Official implementation of the Law of Vision Representation in MLLMs☆154Updated 5 months ago