alibaba / AICITY2024_Track2_AliOpenTrek_CityLLaVA
☆39Updated 7 months ago
Alternatives and similar repositories for AICITY2024_Track2_AliOpenTrek_CityLLaVA:
Users that are interested in AICITY2024_Track2_AliOpenTrek_CityLLaVA are comparing it to the libraries listed below
- ☆30Updated 6 months ago
- ☆29Updated 3 months ago
- [TPAMI reviewing] Towards Visual Grounding: A Survey☆93Updated last week
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated this week
- A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆48Updated 8 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆93Updated last week
- Open-vocabulary Semantic Segmentation☆34Updated last year
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆47Updated 2 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆88Updated last month
- Official implementation of the CVPR paper Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent …☆25Updated last year
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆66Updated 3 weeks ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆55Updated 10 months ago
- This repository compiles a list of papers related to Video LLM.☆19Updated 7 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆63Updated last year
- Open-Vocabulary Panoptic Segmentation☆22Updated 5 months ago
- ☆22Updated last month
- ☆79Updated last year
- (NeurIPS 2024) Official repository of paper "Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models"☆26Updated 3 weeks ago
- Official repository for the NuScenes-MQA. This paper is accepted by LLVA-AD Workshop at WACV 2024.☆25Updated last year
- Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆100Updated 2 months ago
- [ICCV 2023] Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment☆43Updated last year
- A collection of visual instruction tuning datasets.☆76Updated 11 months ago
- The official implementation of RAR☆81Updated 10 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆106Updated 8 months ago
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆21Updated 10 months ago
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆86Updated last month
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆44Updated 8 months ago
- [NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)☆85Updated last month
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆37Updated last year
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Updated last year