chengyzhao / TextPSG
☆16Updated 10 months ago
Related projects: ⓘ
- Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"☆42Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆49Updated last month
- ☆11Updated 2 months ago
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆74Updated 4 months ago
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆47Updated 9 months ago
- Code for the paper "Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundatio…☆22Updated 10 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆20Updated 4 months ago
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆39Updated last month
- ☆57Updated last year
- state-of-the-art open vocabulary detector on COCO/LVIS/V3Det☆23Updated 5 months ago
- ☆56Updated last year
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆47Updated 2 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆17Updated 3 weeks ago
- This repository contains the Adverbs in Recipes (AIR) dataset and the code published at the CVPR 23 paper: "Learning Action Changes by Me…☆13Updated last year
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆18Updated last week
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Updated 5 months ago
- ☆43Updated 2 months ago
- Lumen: a Large multimodal model with versatile vision-centric capabilities☆16Updated 3 months ago
- [ICCV 2023] Label-Efficient Online Continual Object Detection in Streaming Video☆17Updated 8 months ago
- ☆31Updated 3 months ago
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆13Updated 2 months ago
- official repo of "VideoGUI: A Benchmark for GUI Automation from Instructional Videos"☆19Updated 3 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- ☆19Updated last month
- ☆32Updated 3 months ago
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)☆25Updated 2 months ago
- ☆29Updated 2 months ago
- OVAD: Open-vocabulary Attribute Detection code☆28Updated last year
- [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset☆43Updated 2 weeks ago