ProvenceStar / PartGLEE
[ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
☆15Updated last week
Related projects: ⓘ
- Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆15Updated 6 months ago
- ☆26Updated last week
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆47Updated 2 months ago
- cliptrase☆15Updated 2 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆70Updated 2 weeks ago
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆45Updated 4 months ago
- state-of-the-art open vocabulary detector on COCO/LVIS/V3Det☆23Updated 5 months ago
- ☆16Updated last year
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Updated 5 months ago
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆25Updated 7 months ago
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation☆43Updated 2 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆20Updated 4 months ago
- [ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.☆27Updated last month
- ☆32Updated 5 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆42Updated 3 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆35Updated last month
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆26Updated 6 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆49Updated last week
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation☆24Updated 2 months ago
- Detectron2 Toolbox and Benchmark for V3Det☆15Updated 3 months ago
- Large-Vocabulary Video Instance Segmentation dataset☆73Updated 2 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆35Updated 11 months ago
- The official implementation of RAR☆61Updated 5 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆62Updated 4 months ago
- [ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval☆16Updated 2 years ago
- ☆27Updated 5 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- [MM2024 Oral] 3D-GRES: Generalized 3D Referring Expression Segmentation☆15Updated last month
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆55Updated 5 months ago
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆13Updated 3 months ago