MacavityT / REF-VLMLinks
☆25Updated 2 months ago
Alternatives and similar repositories for REF-VLM
Users that are interested in REF-VLM are comparing it to the libraries listed below
Sorting:
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆51Updated 5 months ago
- ☆51Updated last month
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated 7 months ago
- ☆17Updated last month
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆56Updated last year
- [arXiv'25] Official Implementation of "Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning"☆17Updated 4 months ago
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆55Updated last month
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated last month
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆46Updated 2 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆40Updated 11 months ago
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆27Updated 9 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 10 months ago
- ☆81Updated 2 months ago
- ☆30Updated 4 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆69Updated 7 months ago
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆37Updated 2 weeks ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆26Updated 2 months ago
- LEO: A powerful Hybrid Multimodal LLM☆18Updated 4 months ago
- ☆42Updated 3 weeks ago
- 🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resamplin…☆30Updated last week
- ☆59Updated 2 weeks ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆59Updated 3 months ago
- ☆43Updated 5 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆34Updated last year
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 7 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 5 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆18Updated 7 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆76Updated 4 months ago
- Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation☆58Updated 3 months ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples☆23Updated 6 months ago