[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
☆61Aug 17, 2021Updated 4 years ago
Alternatives and similar repositories for VidSitu
Users that are interested in VidSitu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Dec 9, 2023Updated 2 years ago
- Condensed Movies Challenge 2021☆20Sep 21, 2022Updated 3 years ago
- SGAP-Net: Semantic-Guided Attentive Prototypes Network for Few-Shot Human-Object Interaction Recognition, AAAI2020.☆14Dec 15, 2020Updated 5 years ago
- A video database bridging human actions and human-object relationships☆159Jun 30, 2020Updated 5 years ago
- VisualCOMET: Reasoning about the Dynamic Context of a Still Image☆88Jun 12, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering☆29Jul 1, 2024Updated last year
- Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]☆195Sep 21, 2022Updated 3 years ago
- [EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction☆51Aug 20, 2022Updated 3 years ago
- ☆87Mar 4, 2024Updated 2 years ago
- Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models (ACL-Findings 2024)☆16Apr 23, 2024Updated last year
- Official implementation of the ACL 2023 paper: "Zero-shot Faithful Factual Error Correction"☆17Aug 14, 2023Updated 2 years ago
- Codes for ECCV paper: "Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation"☆16Jul 20, 2020Updated 5 years ago
- PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)☆144Apr 8, 2023Updated 2 years ago
- [ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation☆59Aug 27, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions☆174Oct 22, 2023Updated 2 years ago
- Codebase for VidHal: Benchmarking Hallucinations in Vision LLMs☆14Apr 19, 2025Updated 11 months ago
- Referring expression comprehension on ReferIt(RefClef)☆10Nov 28, 2016Updated 9 years ago
- MERLOT: Multimodal Neural Script Knowledge Models☆226Mar 15, 2022Updated 4 years ago
- [CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…☆730Aug 8, 2023Updated 2 years ago
- ☆17Nov 14, 2022Updated 3 years ago
- This is a repository for paper titled, PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Plann…☆14Nov 3, 2023Updated 2 years ago
- Code for the HowTo100M paper☆298Mar 10, 2020Updated 6 years ago
- [CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)☆69Jun 10, 2020Updated 5 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Implementation for the CVPR2019 paper "Graphical Contrastive Losses for Scene Graph Generation"☆201Apr 2, 2020Updated 5 years ago
- Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"☆161Apr 29, 2020Updated 5 years ago
- Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021☆66Oct 21, 2021Updated 4 years ago
- Situation With Groundings (SWiG) dataset and Joint Situation Localizer (JSL)☆70Mar 19, 2021Updated 5 years ago
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆117Sep 15, 2022Updated 3 years ago
- Tools for movie and video research☆306Jun 20, 2022Updated 3 years ago
- CVPR2022:Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency☆18Aug 10, 2022Updated 3 years ago
- [NeurIPS 2022] Egocentric Video-Language Pretraining☆258May 9, 2024Updated last year
- [TPAMI 2024] This is the official Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding"…☆28May 8, 2025Updated 10 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [CVPR 2021] Multi-shot Temporal Event Localization: a Benchmark☆55Mar 19, 2022Updated 4 years ago
- ☆13Feb 14, 2022Updated 4 years ago
- ☆34Jun 2, 2023Updated 2 years ago
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated 2 years ago
- Implementation of "Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning" (https://arxiv.…☆26Nov 3, 2018Updated 7 years ago
- Evaluation script for VoxMovies dataset in PyTorch☆23Jan 12, 2024Updated 2 years ago
- Code release for Learning to Assemble Neural Module Tree Networks for Visual Grounding (ICCV 2019)☆39Nov 23, 2019Updated 6 years ago