Support, annotation, evaluation, and baseline models for the imSitu dataset.
☆60May 18, 2020Updated 5 years ago
Alternatives and similar repositories for imSitu
Users that are interested in imSitu are comparing it to the libraries listed below
Sorting:
- Situation With Groundings (SWiG) dataset and Joint Situation Localizer (JSL)☆70Mar 19, 2021Updated 4 years ago
- Visual Verb Sense Disambiguation☆13Apr 26, 2019Updated 6 years ago
- ☆84Apr 12, 2021Updated 4 years ago
- ☆22Dec 18, 2016Updated 9 years ago
- PyTorch implementation for our CVPR 2020 Paper "Attention-based Context Aware Reasoning for Situation Recognition"☆20Oct 20, 2020Updated 5 years ago
- [ICCV 2019] Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations☆31Aug 6, 2021Updated 4 years ago
- Contains code for the EMNLP paper `Learning Linguistic Attributes for Zero-Shot Verb Classification'☆26Mar 20, 2018Updated 7 years ago
- Code of the Grounded MUIE model, REAMO☆11Dec 3, 2024Updated last year
- Large-scale city camera video dataset☆10Jul 20, 2020Updated 5 years ago
- Pytorch Implementation of Learning Similarity between Scene Graphs and Images with Transformers (GICON))☆13Nov 9, 2023Updated 2 years ago
- Feature resources of "Diagnosing the Environment Bias in Vision-and-Language Navigation"☆16May 6, 2020Updated 5 years ago
- A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision…☆37Apr 25, 2021Updated 4 years ago
- ☆14Dec 9, 2023Updated 2 years ago
- CVPR2025: Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning☆38Mar 21, 2025Updated 11 months ago
- [COLING 2018] Learning Visually-Grounded Semantics from Contrastive Adversarial Samples.☆57Sep 12, 2019Updated 6 years ago
- Toolkit for the VLOG dataset☆37Mar 30, 2018Updated 7 years ago
- [CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering☆20Sep 21, 2024Updated last year
- ☆69Feb 25, 2019Updated 7 years ago
- Improving Visual Relation Detection using Depth Maps (ICPR 2020)☆47Jul 24, 2022Updated 3 years ago
- RL framework for embodied agents based on PyTorch☆11Apr 11, 2019Updated 6 years ago
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆49Jan 8, 2025Updated last year
- [EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…☆20Jan 17, 2022Updated 4 years ago
- ☆44Mar 8, 2021Updated 4 years ago
- Code for Knowledge-Embedded Routing Network for Scene Graph Generation (CVPR 2019)☆123Aug 17, 2022Updated 3 years ago
- Disentangled Pre-training for Human-Object Interaction Detection☆27Sep 17, 2025Updated 5 months ago
- Detectron for image/video region feature extraction, inspired by Xinlei's repo☆22Nov 21, 2020Updated 5 years ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆52Jul 16, 2024Updated last year
- Code for the CVPR 2020 paper 'Action Modifiers: Learning from Adverbs in Instructional Videos'☆23May 17, 2021Updated 4 years ago
- code for ACL 2023 paper 'Event Extraction as Question Generation and Answering'☆25Aug 13, 2023Updated 2 years ago
- vist story telling evaluation tool☆21Dec 5, 2023Updated 2 years ago
- Factorizable Net (Multi-GPU version): An Efficient Subgraph-based Framework for Scene Graph Generation☆220Jul 25, 2019Updated 6 years ago
- ☆27Oct 7, 2021Updated 4 years ago
- Scene Graph Prediction with Limited Labels☆54Oct 3, 2023Updated 2 years ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- Code for Greedy Gradient Ensemble for Visual Question Answering (ICCV 2021, Oral)☆27Mar 28, 2022Updated 3 years ago
- Methods of training NLP models to ignored biased strategies☆55May 22, 2023Updated 2 years ago
- This is our PyTorch implementation of Multi-level Scene Description Network (MSDN) proposed in our ICCV 2017 paper.☆230Nov 19, 2019Updated 6 years ago
- ☆25Apr 16, 2022Updated 3 years ago
- Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image …☆563Aug 21, 2021Updated 4 years ago