LilyDaytoy / OpenPVSG
Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23
☆74Updated 4 months ago
Related projects: ⓘ
- Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]☆85Updated 2 months ago
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)☆25Updated 2 months ago
- [ICCV 2023] RLIPv2: Fast Scaling of Relational Language-Image Pre-training☆112Updated 3 months ago
- ☆55Updated 2 months ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆47Updated 2 months ago
- [NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation☆70Updated 2 months ago
- Multimodal Video Understanding Framework (MVU)☆23Updated 4 months ago
- [CVPR 2024 Champions] Solutions for EgoVis Chanllenges in CVPR 2024☆100Updated 2 months ago
- [ICLR 2023] SQA3D for embodied scene understanding and reasoning☆115Updated 11 months ago
- Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆107Updated 3 weeks ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆80Updated 2 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- Large-Vocabulary Video Instance Segmentation dataset☆73Updated 2 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆49Updated this week
- Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"☆42Updated last year
- ☆16Updated 10 months ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"☆81Updated 5 months ago
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆152Updated 2 months ago
- Language Repository for Long Video Understanding☆27Updated 3 months ago
- ☆53Updated 2 months ago
- Code for the paper Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models @ CVPR 2024☆53Updated 3 months ago
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆68Updated last month
- ☆58Updated this week
- Code and data release for the paper "Learning Object State Changes in Videos: An Open-World Perspective" (CVPR 2024)☆27Updated last week
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆75Updated 11 months ago
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆43Updated last year
- official repo of "VideoGUI: A Benchmark for GUI Automation from Instructional Videos"☆19Updated 3 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆70Updated 2 weeks ago
- [CVPR 2024] Code for HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation☆54Updated 2 months ago
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆39Updated last month