ltttpku / CMMP
☆18Updated 2 months ago
Alternatives and similar repositories for CMMP:
Users that are interested in CMMP are comparing it to the libraries listed below
- Disentangled Pre-training for Human-Object Interaction Detection☆18Updated 2 months ago
- ☆22Updated last year
- [ECCV 2024 Best Paper Candidate] Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Vi…☆48Updated 3 months ago
- ☆12Updated 2 months ago
- Vision Relation Transformer for Unbiased Scene Graph Generation (ICCV 2023)☆22Updated last year
- ☆34Updated last year
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆66Updated this week
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆79Updated 9 months ago
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆22Updated 3 months ago
- ☆47Updated 2 years ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆60Updated 5 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆32Updated 3 months ago
- [NeurIPS 2023] LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition☆17Updated 7 months ago
- [NeurIPS 2024] Official code for paper "EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection"☆23Updated last month
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆25Updated this week
- Code for the paper "Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundatio…☆27Updated last year
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆32Updated 4 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆54Updated 7 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year
- ☆16Updated last year
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆27Updated 2 years ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆31Updated 10 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆25Updated 8 months ago
- ☆27Updated 3 months ago
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆27Updated 11 months ago
- [TPAMI reviewing] Towards Visual Grounding: A Survey☆42Updated this week
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆31Updated last month
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆27Updated 2 months ago
- [ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval☆16Updated 2 years ago
- Official implementation of TagAlign☆34Updated last month