zmykevin / UVLPView external linksLinks
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆22Apr 15, 2022Updated 3 years ago
Alternatives and similar repositories for UVLP
Users that are interested in UVLP are comparing it to the libraries listed below
Sorting:
- Code for "Counterfactual Variable Control for Robust and Interpretable Question Answering"☆14Oct 13, 2020Updated 5 years ago
- A pytorch implemetation of data augmentation method for visual question answering☆21May 25, 2023Updated 2 years ago
- "Describing Textures using Natural Language" code and data, ECCV 2020 Oral.☆17Aug 6, 2020Updated 5 years ago
- CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training☆34Nov 9, 2021Updated 4 years ago
- ☆24Apr 4, 2022Updated 3 years ago
- Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"☆26Oct 20, 2022Updated 3 years ago
- A complete end-to-end Deep Learning system to generate high quality human like speech in English for Korean Drama (WIP)☆13Sep 17, 2022Updated 3 years ago
- Code for DVD A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue☆14Oct 12, 2021Updated 4 years ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆14Dec 12, 2023Updated 2 years ago
- ☆12Mar 8, 2021Updated 4 years ago
- Feature resources of "Diagnosing the Environment Bias in Vision-and-Language Navigation"☆16May 6, 2020Updated 5 years ago
- Counterfactual Samples Synthesizing for Robust VQA☆79Nov 24, 2022Updated 3 years ago
- Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware P…☆59Mar 24, 2023Updated 2 years ago
- Starter code for the VMT task and challenge☆51Jul 29, 2020Updated 5 years ago
- [EMNLP2022] Transformer-based Entity Typing in Knowledge Graphs☆16Nov 26, 2024Updated last year
- This repository provides the dataset introduced by our WSSTG paper☆13Jul 21, 2019Updated 6 years ago
- An image-oriented evaluation tool for image captioning systems (EMNLP-IJCNLP 2019)☆37May 3, 2020Updated 5 years ago
- Visual Relation Grounding in Videos (ECCV'20, Spotlight)☆57Dec 8, 2022Updated 3 years ago
- End-to-end Multi-modal Video Temporal Grounding, NeurIPS 2021☆18Oct 24, 2021Updated 4 years ago
- Code for ACL 2020 paper "Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA." Hyounghun Kim, Zineng T…☆34May 14, 2020Updated 5 years ago
- A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)☆149Nov 18, 2020Updated 5 years ago
- Implementation for CVPR 2020 Paper "Two Causal Principles for Improving Visual Dialog"☆31Feb 19, 2023Updated 2 years ago
- Code for paper, "TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency" ECCV 2022☆39Feb 17, 2023Updated 2 years ago
- PyTorch implementation of "TALL: Temporal Activity Localization via Language Query. Gao et al. ICCV2017."☆14Apr 20, 2019Updated 6 years ago
- support Large Vocabulary Instance Segmentation (LVIS) dataset for mmdetection☆16Apr 24, 2020Updated 5 years ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Mar 11, 2022Updated 3 years ago
- my Ph.D. thesis (Zhejiang University)☆38Apr 9, 2022Updated 3 years ago
- MAC: Mining Activity Concepts for Language-based Temporal Localization☆36Nov 26, 2018Updated 7 years ago
- ☆44Aug 2, 2021Updated 4 years ago
- source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT☆72Nov 14, 2022Updated 3 years ago
- ☆17Mar 13, 2023Updated 2 years ago
- Multi-faceted Video Moment Localizer☆17Jun 19, 2020Updated 5 years ago
- Code for the paper: Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos☆71Sep 7, 2021Updated 4 years ago
- Cooperative Vision-and-Dialog Navigation☆72Nov 22, 2022Updated 3 years ago
- Implementation of Soft-Label Chain Conditional Random Field for Phrase Grounding in PyTorch☆16Oct 21, 2022Updated 3 years ago
- [EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction☆51Aug 20, 2022Updated 3 years ago
- Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering☆28Jul 1, 2024Updated last year
- Code and data for the project "Visually grounded continual learning of compositional semantics"☆22Dec 27, 2022Updated 3 years ago
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆24Aug 5, 2023Updated 2 years ago