guanghuixu/CRN_tvqa

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/guanghuixu/CRN_tvqa)

guanghuixu / CRN_tvqa

☆15

Alternatives and similar repositories for CRN_tvqa

Users that are interested in CRN_tvqa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yashkant / sam-textvqa
View on GitHub
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
☆65Sep 15, 2021Updated 4 years ago
microsoft / TAP
View on GitHub
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
☆72May 22, 2023Updated 3 years ago
ZephyrZhuQi / ssbaseline
View on GitHub
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Apr 5, 2022Updated 4 years ago
ronghanghu / mmf
View on GitHub
A modular framework for Visual Question Answering research by the FAIR A-STAR team
☆45Aug 26, 2021Updated 4 years ago
Alvin-Zeng / GCM
View on GitHub
Graph Convolutional Module for Temporal Action Localization in Videos
☆10Jul 4, 2020Updated 6 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
uakarsh / latr
View on GitHub
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…
☆56Oct 30, 2024Updated last year
zfchenUnique / VID-Sentence
View on GitHub
This repository provides the dataset introduced by our WSSTG paper
☆13Jul 21, 2019Updated 7 years ago
malaysia-ai / dataset
View on GitHub
Recipes to prepare datasets!
☆15Jun 28, 2026Updated 3 weeks ago
zfchenUnique / WSSTG
View on GitHub
This repository contains the main baselines introduced in WSSTG (ACL 2019).
☆57Jul 8, 2024Updated 2 years ago
wzk1015 / CNMT
View on GitHub
[AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
☆24Mar 29, 2023Updated 3 years ago
ChenyuGAO-CS / SMA
View on GitHub
The imdb files with SBD-Trans OCR for TextVQA dataset.
☆11Nov 30, 2021Updated 4 years ago
ShouyangDong / tse-t
View on GitHub
☆20Sep 28, 2020Updated 5 years ago
INK-USC / VisCOLL
View on GitHub
Code and data for the project "Visually grounded continual learning of compositional semantics"
☆22Dec 27, 2022Updated 3 years ago
mengcaopku / DCNet
View on GitHub
[ACM MM 22] Correspondence Matters for Video Referring Expression Comprehension
☆15Sep 4, 2022Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
asrafulashiq / hamnet
View on GitHub
PyTorch implementation of AAAI 2021 paper: A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization
☆42Apr 20, 2021Updated 5 years ago
AmeenAli / VideoMatch
View on GitHub
☆14Jan 5, 2022Updated 4 years ago
crodriguezo / DORi
View on GitHub
Public repository for DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video Code accompan…
☆21Apr 7, 2021Updated 5 years ago
zbwglory / CMHSE
View on GitHub
The code repository for "Cross-Modal and Hierarchical Modeling of Video and Text" in PyTorch
☆20Apr 26, 2020Updated 6 years ago
FuchenUSTC / AherNet
View on GitHub
☆15Aug 25, 2020Updated 5 years ago
zoujuny / TableCell
View on GitHub
在TableBank的基础上，进一步标注到单元格精度，利用目标检测/分割实现单元格定位。
☆14Dec 11, 2019Updated 6 years ago
mingdachen / TVRecap
View on GitHub
TVRecap: A Dataset for Generating Stories with Character Descriptions
☆21Jun 5, 2023Updated 3 years ago
cdancette / vqa-cp-leaderboard
View on GitHub
A collections of papers about VQA-CP datasets and their results
☆42Mar 18, 2022Updated 4 years ago
ChopinSharp / ref-nms
View on GitHub
Official codebase for "Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding"
☆22Dec 20, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ikuinen / semantic_completion_network
View on GitHub
☆26Aug 4, 2020Updated 5 years ago
dmis-lab / position-bias
View on GitHub
EMNLP'2020: Look at the First Sentence: Position Bias in Question Answering
☆29Nov 4, 2020Updated 5 years ago
wayne980 / PolyLoss
View on GitHub
Source code of Universal Weighting Metric Learning for Cross-Modal Matching. The paper is accepted by CVPR2020.
☆22Nov 2, 2022Updated 3 years ago
kcyu2014 / multi-model-forgetting
View on GitHub
ICML2019 Accepted Paper. Overcoming Multi-Model Forgetting
☆14Jun 5, 2019Updated 7 years ago
zmzhang2000 / MIGCN
View on GitHub
Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos
☆16May 23, 2023Updated 3 years ago
tanghaoyu258 / ACRM-for-moment-retrieval
View on GitHub
☆27Aug 16, 2022Updated 3 years ago
youngfly11 / ReIR-WeaklyGrounding.pytorch
View on GitHub
The official PyTorch code for "Relation-aware Instance Refinement for Weakly Supervised Visual Grounding" accepted by CVPR2021
☆28Oct 9, 2021Updated 4 years ago
vmurahari3 / visdial-diversity
View on GitHub
Pytorch implementation of https://arxiv.org/pdf/1909.10470.pdf
☆32Aug 23, 2021Updated 4 years ago
HAWLYQ / Qc-TextCap
View on GitHub
☆16Dec 25, 2021Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
SpringerNLP / Chapter9
View on GitHub
Chapter 9: Attention and Memory Augmented Networks
☆12Jul 23, 2019Updated 6 years ago
lucidrains / AoA-pytorch
View on GitHub
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
☆43Nov 8, 2020Updated 5 years ago
hyounghk / VideoQADenseCapFrameGate-ACL2020
View on GitHub
Code for ACL 2020 paper "Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA." Hyounghun Kim, Zineng T…
☆34May 14, 2020Updated 6 years ago
jshi31 / NAFAE
View on GitHub
Implementation of paper "Not All Frames Are Equal: Weakly-Supervised Video Grounding with Contextual Similarity and Visual Clustering Los…
☆30Jun 29, 2020Updated 6 years ago
tgxs002 / wikiscenes
View on GitHub
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.
☆43Apr 30, 2024Updated 2 years ago
kunalBhashkar / Bank-Marketing-Data-Set-Classification
View on GitHub
Bank Marketing data classification
☆12Oct 2, 2020Updated 5 years ago
yytzsy / grounding_changing_distribution
View on GitHub
☆36Apr 14, 2021Updated 5 years ago