alibaba / AICITY2024_Track2_AliOpenTrek_CityLLaVA
☆32Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for AICITY2024_Track2_AliOpenTrek_CityLLaVA
- A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆46Updated 5 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated this week
- ☆37Updated this week
- [ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“☆42Updated 3 months ago
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆65Updated 4 months ago
- Official implementation of the CVPR paper Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent …☆25Updated last year
- Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving☆71Updated 10 months ago
- [CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".☆27Updated 2 years ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆76Updated 5 months ago
- ☆78Updated 9 months ago
- ☆27Updated 3 months ago
- ☆16Updated 2 years ago
- ☆12Updated 5 months ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆35Updated 5 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆90Updated 3 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆35Updated last year
- The official implementation of RAR☆72Updated 7 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated 7 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆90Updated this week
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆24Updated 9 months ago
- ☆20Updated 11 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆33Updated last week
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Updated 11 months ago
- Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)☆66Updated last week
- ☆46Updated 5 months ago
- Large Multimodal Model☆15Updated 7 months ago
- ☆56Updated last year
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆18Updated last month
- ☆50Updated 3 months ago
- ☆53Updated last month