om-ai-lab / GroundVLP
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
☆51Updated 8 months ago
Related projects: ⓘ
- The official implementation of RAR☆61Updated 5 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆72Updated 3 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆57Updated 3 months ago
- ☆83Updated 9 months ago
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆104Updated 2 months ago
- ☆58Updated this week
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆36Updated last year
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 5 months ago
- A collection of visual instruction tuning datasets.☆74Updated 6 months ago
- ☆100Updated last month
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆114Updated 2 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆34Updated 2 months ago
- ☆52Updated 8 months ago
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆41Updated 4 months ago
- Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆44Updated 3 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆52Updated 5 months ago
- MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆116Updated 2 weeks ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆47Updated 2 months ago
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆63Updated 3 months ago
- OvarNet official implement of the paper "OvarNet: Towards Open-vocabulary Object Attribute Recognition"☆98Updated last year
- Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)☆41Updated 3 months ago
- ☆26Updated 6 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆51Updated 11 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆44Updated 3 weeks ago
- Dense Connector for MLLMs☆98Updated last month
- FreeVA: Offline MLLM as Training-Free Video Assistant☆42Updated 3 months ago
- PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.☆174Updated 3 months ago
- Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]☆91Updated last year
- ☆31Updated 2 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆37Updated 11 months ago