SooLab / REP-ERULinks
[ECCV2022] A PyTorch implementation of the paper "Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding"
โ13Updated 2 years ago
Alternatives and similar repositories for REP-ERU
Users that are interested in REP-ERU are comparing it to the libraries listed below
Sorting:
- ๐พ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)โ62Updated 7 months ago
- [CVPR 2024] Data and benchmark code for the EgoExoLearn datasetโ68Updated last week
- [NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentationโ88Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioningโ79Updated 10 months ago
- [NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Grapโฆโ77Updated last year
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Modelsโ37Updated last year
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".โ111Updated 4 months ago
- Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"โ63Updated 2 years ago
- ใAAAI 2024ใ Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentationโ82Updated 2 months ago
- [CVPR 2024 Champions][ICLR 2025] Solutions for EgoVis Chanllenges in CVPR 2024โ129Updated 3 months ago
- ICLRโ24 Offical Implementation of Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularizationโ73Updated last year
- Egocentric Video Understanding Dataset (EVUD)โ31Updated last year
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrievalโ39Updated 4 months ago
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'โ33Updated last year
- [IJCV 2025] VLPrompt-PSG: Vision-Language Prompting for Panoptic Scene Graph Generationโ27Updated 11 months ago
- OVAD: Open-vocabulary Attribute Detection codeโ31Updated 2 years ago
- โ58Updated 2 years ago
- Large-Vocabulary Video Instance Segmentation datasetโ91Updated last year
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Modelsโ47Updated 7 months ago
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentationโ47Updated last year
- [CVPR'25] ๐๐ EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answeringโ37Updated 2 months ago
- [CVPR2023] The code for ใPosition-guided Text Prompt for Vision-Language Pre-trainingใโ152Updated 2 years ago
- โ37Updated last year
- Disentangled Pre-training for Human-Object Interaction Detectionโ25Updated 2 months ago
- โ24Updated last year
- Test-Time Training on Video Streamsโ64Updated 2 years ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examplesโ30Updated 9 months ago
- MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)โ31Updated last year
- (NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detectionโ116Updated last year
- CVPR2022 - Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentationโ23Updated 3 years ago