alibaba / AICITY2024_Track2_AliOpenTrek_CityLLaVA
☆45Updated 9 months ago
Alternatives and similar repositories for AICITY2024_Track2_AliOpenTrek_CityLLaVA:
Users that are interested in AICITY2024_Track2_AliOpenTrek_CityLLaVA are comparing it to the libraries listed below
- ☆34Updated 9 months ago
- Open-vocabulary Semantic Segmentation☆34Updated last year
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution☆46Updated last month
- LP-OVOD: Open-Vocabulary Object Detection by Linear Probing (WACV 2024)☆23Updated 9 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆90Updated 3 months ago
- ☆22Updated last year
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆41Updated 3 months ago
- 【IEEE T-IV】A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆50Updated 11 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆71Updated 2 months ago
- [WACV 2025] Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"☆74Updated last month
- Code for "DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets", accepted at Neurips 2023 (Main confer…☆21Updated last year
- Official implementation of the CVPR paper Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent …☆25Updated last year
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆28Updated last year
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆24Updated 3 weeks ago
- OVMR: Open-Vocabulary Recognition with Multi-Modal References (CVPR24)☆27Updated last week
- (ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?☆25Updated 4 months ago
- Code Release for MaskCLIP (ICML 2023)☆64Updated last year
- [CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of t…☆34Updated 2 months ago
- Taming Self-Training for Open-Vocabulary Object Detection, CVPR 2024☆22Updated last year
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆68Updated 6 months ago
- CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation☆70Updated 8 months ago
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.☆18Updated last month
- Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆129Updated 4 months ago
- (CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of La…☆159Updated 2 weeks ago
- Open-Vocabulary Panoptic Segmentation☆23Updated 7 months ago
- [ICCV 2023] Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment☆43Updated last year
- ☆84Updated last year
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆74Updated 3 months ago
- [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection☆167Updated 3 weeks ago
- (NeurIPS 2024) Official repository of paper "Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models"☆27Updated last month