alibaba / AICITY2024_Track2_AliOpenTrek_CityLLaVALinks
☆48Updated 11 months ago
Alternatives and similar repositories for AICITY2024_Track2_AliOpenTrek_CityLLaVA
Users that are interested in AICITY2024_Track2_AliOpenTrek_CityLLaVA are comparing it to the libraries listed below
Sorting:
- ☆41Updated last week
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆91Updated 5 months ago
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution☆51Updated 3 months ago
- [NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)☆92Updated last month
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆67Updated last year
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆56Updated 2 months ago
- 【IEEE T-IV】A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆51Updated last year
- (NeurIPS 2024) Official repository of paper "Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models"☆29Updated 3 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆98Updated 11 months ago
- ☆59Updated last month
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆46Updated 3 weeks ago
- ☆45Updated 6 months ago
- Open-vocabulary Semantic Segmentation☆33Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated last year
- ☆80Updated 7 months ago
- ☆29Updated 5 months ago
- InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024☆81Updated last year
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆37Updated last year
- ☆84Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆55Updated 7 months ago
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆29Updated last year
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆58Updated 8 months ago
- [WACV 2025] Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"☆79Updated 3 months ago
- ☆12Updated 6 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆43Updated 5 months ago
- ☆32Updated last year
- Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆153Updated 6 months ago
- (ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?☆25Updated 6 months ago
- Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"☆93Updated 2 years ago
- The official implementation of RAR☆88Updated last year