guanghuixu / CRN_tvqaView external linksLinks
☆15Oct 27, 2020Updated 5 years ago
Alternatives and similar repositories for CRN_tvqa
Users that are interested in CRN_tvqa are comparing it to the libraries listed below
Sorting:
- ☆30May 7, 2021Updated 4 years ago
- Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.☆65Sep 15, 2021Updated 4 years ago
- TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)☆72May 22, 2023Updated 2 years ago
- Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]☆57Apr 5, 2022Updated 3 years ago
- Graph Convolutional Module for Temporal Action Localization in Videos☆10Jul 4, 2020Updated 5 years ago
- A modular framework for Visual Question Answering research by the FAIR A-STAR team☆45Aug 26, 2021Updated 4 years ago
- This repository provides the dataset introduced by our WSSTG paper☆13Jul 21, 2019Updated 6 years ago
- ☆15Aug 25, 2020Updated 5 years ago
- Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos☆16May 23, 2023Updated 2 years ago
- [ACM MM 22] Correspondence Matters for Video Referring Expression Comprehension☆15Sep 4, 2022Updated 3 years ago
- PyTorch implementation of AAAI 2021 paper: A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization☆42Apr 20, 2021Updated 4 years ago
- ☆22Dec 8, 2022Updated 3 years ago
- Shows visual grounding methods can be right for the wrong reasons! (ACL 2020)☆23Jun 26, 2020Updated 5 years ago
- Public repository for DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video Code accompan…☆21Apr 7, 2021Updated 4 years ago
- Code and data for the project "Visually grounded continual learning of compositional semantics"☆22Dec 27, 2022Updated 3 years ago
- ☆20Sep 28, 2020Updated 5 years ago
- [AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps☆24Mar 29, 2023Updated 2 years ago
- The code repository for "Cross-Modal and Hierarchical Modeling of Video and Text" in PyTorch☆20Apr 26, 2020Updated 5 years ago
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…☆55Oct 30, 2024Updated last year
- ☆27Oct 7, 2021Updated 4 years ago
- [WACV 2024] Code for "Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders"☆25Aug 16, 2024Updated last year
- This repository contains the main baselines introduced in WSSTG (ACL 2019).☆56Jul 8, 2024Updated last year
- EMNLP'2020: Look at the First Sentence: Position Bias in Question Answering☆29Nov 4, 2020Updated 5 years ago
- ☆26Aug 4, 2020Updated 5 years ago
- ☆27Oct 19, 2022Updated 3 years ago
- EPIC-Kitchens-100 Action Recognition baselines: TSN, TRN, TSM☆33Mar 15, 2022Updated 3 years ago
- [CVPR2022] Unsupervised Pre-training for Temporal Action Localization Tasks (UP-TAL)☆29Mar 9, 2022Updated 3 years ago
- Dataset created for the Power Line Insulators Inspection Detections☆10Jul 2, 2020Updated 5 years ago
- Implementation of paper "Not All Frames Are Equal: Weakly-Supervised Video Grounding with Contextual Similarity and Visual Clustering Los…☆30Jun 29, 2020Updated 5 years ago
- Code for ACL 2020 paper "Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA." Hyounghun Kim, Zineng T…☆34May 14, 2020Updated 5 years ago
- Pytorch implementation of https://arxiv.org/pdf/1909.10470.pdf☆32Aug 23, 2021Updated 4 years ago
- ☆36Apr 14, 2021Updated 4 years ago
- Photorealism model use RealVisXL v4.0☆12Feb 20, 2024Updated last year
- [NeurIPS'25 Spotlight] This is the official codebase for the paper: STAR: A Benchmark for Astronomical Star Fields Super-Resolution☆15Oct 9, 2025Updated 4 months ago
- Code for ACM MM2020 paper: Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization☆34Sep 3, 2020Updated 5 years ago
- MAC: Mining Activity Concepts for Language-based Temporal Localization☆36Nov 26, 2018Updated 7 years ago
- Revisiting Anchor Mechanisms for Temporal Action Localization (TIP 2020)☆36Sep 26, 2021Updated 4 years ago
- Pytorch implementation of "Learning Deep Structure-Preserving Image-Text Embeddings"☆37Jan 2, 2020Updated 6 years ago
- Fusional approaches for temporal action localization in untrimmed videos☆35Mar 17, 2023Updated 2 years ago