ivattyue / SC-Tune
Official code for CVPR 2024 paper, "SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models"
☆16Updated 11 months ago
Alternatives and similar repositories for SC-Tune:
Users that are interested in SC-Tune are comparing it to the libraries listed below
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆41Updated 5 months ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆50Updated 3 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆153Updated 2 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆89Updated last week
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Reward☆18Updated last week
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆27Updated 8 months ago
- Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆22Updated last month
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- ☆11Updated 5 months ago
- Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection☆20Updated 2 weeks ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆85Updated 3 months ago
- [Open LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video☆19Updated last week
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆33Updated last year
- ☆107Updated last month
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆119Updated 2 weeks ago
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆64Updated 9 months ago
- ☆90Updated this week
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆19Updated 6 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆48Updated 11 months ago
- Instruction Tuning in Continual Learning paradigm☆44Updated last month
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆77Updated 5 months ago
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆31Updated last year
- FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding☆12Updated 3 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆26Updated last week
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆45Updated 7 months ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆109Updated 4 months ago
- Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆33Updated last month
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆79Updated 5 months ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆18Updated last month
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆41Updated 6 months ago