alibaba / AICITY2024_Track2_AliOpenTrek_CityLLaVA
☆36Updated 6 months ago
Alternatives and similar repositories for AICITY2024_Track2_AliOpenTrek_CityLLaVA:
Users that are interested in AICITY2024_Track2_AliOpenTrek_CityLLaVA are comparing it to the libraries listed below
- Open-vocabulary Semantic Segmentation☆35Updated 11 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆44Updated last month
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated 2 months ago
- [NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks☆101Updated last month
- A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆48Updated 7 months ago
- ☆28Updated last month
- ☆61Updated 2 months ago
- ☆30Updated 5 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆91Updated 6 months ago
- ☆79Updated 11 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆84Updated this week
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆42Updated 7 months ago
- Code for Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking☆21Updated 3 months ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆54Updated 9 months ago
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆88Updated last month
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆37Updated last week
- ☆58Updated last year
- AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segm…☆71Updated 3 weeks ago
- The offical implemention of JM3D.☆28Updated last year
- Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".☆89Updated last month
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆36Updated last week
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆63Updated 2 months ago
- ☆29Updated 9 months ago
- ☆25Updated 2 months ago
- Open-Vocabulary Panoptic Segmentation☆20Updated 4 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆32Updated 7 months ago
- ☆108Updated 5 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆45Updated 2 months ago
- Making LLaVA Tiny via MoE-Knowledge Distillation☆77Updated last week