[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
☆61Aug 17, 2021Updated 4 years ago
Alternatives and similar repositories for VidSitu
Users that are interested in VidSitu are comparing it to the libraries listed below
Sorting:
- Condensed Movies Challenge 2021☆20Sep 21, 2022Updated 3 years ago
- ☆14Dec 9, 2023Updated 2 years ago
- A video database bridging human actions and human-object relationships☆156Jun 30, 2020Updated 5 years ago
- Codes for ECCV paper: "Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation"☆16Jul 20, 2020Updated 5 years ago
- MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions☆172Oct 22, 2023Updated 2 years ago
- PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)☆144Apr 8, 2023Updated 2 years ago
- Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering☆29Jul 1, 2024Updated last year
- VisualCOMET: Reasoning about the Dynamic Context of a Still Image☆88Jun 12, 2023Updated 2 years ago
- ☆87Mar 4, 2024Updated 2 years ago
- [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…☆729Aug 8, 2023Updated 2 years ago
- MERLOT: Multimodal Neural Script Knowledge Models☆226Mar 15, 2022Updated 3 years ago
- Situation With Groundings (SWiG) dataset and Joint Situation Localizer (JSL)☆70Mar 19, 2021Updated 4 years ago
- (ACM MM24) This is the offical repository of GIST: Improving Parameter Efficient Fine Tuning via Knowledge Interaction.☆11Jan 28, 2024Updated 2 years ago
- [EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction☆51Aug 20, 2022Updated 3 years ago
- [CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)☆69Jun 10, 2020Updated 5 years ago
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆116Sep 15, 2022Updated 3 years ago
- Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"☆161Apr 29, 2020Updated 5 years ago
- ☆96Feb 14, 2022Updated 4 years ago
- [ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation☆59Aug 27, 2022Updated 3 years ago
- Implementation for the CVPR2019 paper "Graphical Contrastive Losses for Scene Graph Generation"☆201Apr 2, 2020Updated 5 years ago
- Code for the HowTo100M paper☆293Mar 10, 2020Updated 5 years ago
- Code and Data for ACL 2023 paper I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors☆16Jun 7, 2023Updated 2 years ago
- [ACL 2023] PyTorch Implementation of Zero-and Few-Shot Event Detection via Prompt-Based Meta Learning☆16Jun 6, 2023Updated 2 years ago
- Referring expression comprehension on ReferIt(RefClef)☆10Nov 28, 2016Updated 9 years ago
- Tools for movie and video research☆305Jun 20, 2022Updated 3 years ago
- [NeurIPS 2022] Egocentric Video-Language Pretraining☆256May 9, 2024Updated last year
- ☆34Jun 2, 2023Updated 2 years ago
- CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency☆18Aug 10, 2022Updated 3 years ago
- THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…☆40Feb 27, 2026Updated last week
- ☆13Feb 14, 2022Updated 4 years ago
- Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (ACL-Findings 2024)☆16Apr 23, 2024Updated last year
- [CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations☆565Aug 22, 2025Updated 6 months ago
- Experiments with multimodal deep learning models based on transformers☆11Oct 9, 2022Updated 3 years ago
- A curated list of grounding natural language in video and related area. :-)☆102Mar 31, 2022Updated 3 years ago
- Adaptive Offline Quintuplet Loss for Image-Text Matching (AOQ)☆34Jul 2, 2020Updated 5 years ago
- awesome grounding: A curated list of research papers in visual grounding☆1,125Sep 21, 2025Updated 5 months ago
- ☆22Feb 25, 2021Updated 5 years ago
- [ICML 2025] Official implementation of Spherical Diffusion Policy: A SE(3) Equivariant Visuomotor Policy with Spherical Fourier Represent…☆39Jul 8, 2025Updated 7 months ago
- ☆17Nov 14, 2022Updated 3 years ago