wzk1015/CNMT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wzk1015/CNMT)

wzk1015 / CNMT

[AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

☆24

Alternatives and similar repositories for CNMT

Users that are interested in CNMT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ZephyrZhuQi / ssbaseline
View on GitHub
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Apr 5, 2022Updated 4 years ago
ChenyuGAO-CS / SMA
View on GitHub
The imdb files with SBD-Trans OCR for TextVQA dataset.
☆11Nov 30, 2021Updated 4 years ago
ronghanghu / vqa-maskrcnn-benchmark-m4c
View on GitHub
Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…
☆13Jan 30, 2020Updated 6 years ago
ronghanghu / mmf
View on GitHub
A modular framework for Visual Question Answering research by the FAIR A-STAR team
☆45Aug 26, 2021Updated 4 years ago
microsoft / TAP
View on GitHub
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
☆72May 22, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HAWLYQ / Qc-TextCap
View on GitHub
☆16Dec 25, 2021Updated 4 years ago
yashkant / sam-textvqa
View on GitHub
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
☆65Sep 15, 2021Updated 4 years ago
Gitsamshi / WeakVRD-Captioning
View on GitHub
Implementation of paper "Improving Image Captioning with Better Use of Caption"
☆33Sep 15, 2020Updated 5 years ago
yangxuntu / catt
View on GitHub
☆12Mar 8, 2021Updated 5 years ago
guanghuixu / CRN_tvqa
View on GitHub
☆15Oct 27, 2020Updated 5 years ago
ShiYaya / emscore
View on GitHub
Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"
☆26Oct 20, 2022Updated 3 years ago
wh0330 / CAG_VisDial
View on GitHub
☆15Aug 13, 2020Updated 5 years ago
YihongT / HGARN
View on GitHub
[IEEE TITS 2024] Activity-aware human mobility prediction with hierarchical graph attention recurrent network.
☆25Jan 19, 2025Updated last year
furkanbiten / object-bias
View on GitHub
Let there be clock in the beach - WACV 2022
☆15Nov 15, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
yangbang18 / MultiCapCLIP
View on GitHub
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆36Aug 8, 2024Updated last year
zinengtang / DeCEMBERT
View on GitHub
Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)
☆17Jan 12, 2023Updated 3 years ago
xiaojino / RUArt
View on GitHub
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
☆10Nov 27, 2022Updated 3 years ago
xinke-wang / Awesome-Text-VQA
View on GitHub
☆188May 8, 2024Updated 2 years ago
doubledaibo / clcaption_nips2017
View on GitHub
Contrastive Learning for Image Captioning
☆51Feb 22, 2018Updated 8 years ago
yanxinzju / CSS-VQA
View on GitHub
Counterfactual Samples Synthesizing for Robust VQA
☆78Nov 24, 2022Updated 3 years ago
csitfun / ConTRoL-dataset
View on GitHub
Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"
☆11Nov 18, 2022Updated 3 years ago
Karhdo / IS207.M12.HTCL
View on GitHub
Phát triển ứng dụng web
☆13Jan 7, 2022Updated 4 years ago
WingsBrokenAngel / MSR-VTT-DataCleaning
View on GitHub
☆19Dec 22, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
TerminologyHub / termhub-in-5-minutes
View on GitHub
Developer project for getting basic API integrations working in under 5 minutes
☆11May 22, 2026Updated 2 months ago
Rid7 / OCR_DataSet
View on GitHub
收集并整理有关OCR的数据集并统一标注格式，以便实验需要
☆12May 17, 2023Updated 3 years ago
zoujuny / TableCell
View on GitHub
在TableBank的基础上，进一步标注到单元格精度，利用目标检测/分割实现单元格定位。
☆14Dec 11, 2019Updated 6 years ago
pzzhang / VinVL
View on GitHub
project page for VinVL
☆360Jul 26, 2023Updated 3 years ago
yonatanbitton / data_efficient_masked_language_modeling_for_vision_and_language
View on GitHub
Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".
☆18Sep 17, 2021Updated 4 years ago
baaaad / ECE
View on GitHub
[ECCV'22 Poster] Explicit Image Caption Editing
☆22Nov 30, 2022Updated 3 years ago
CogComp / Salient-Event-Detection
View on GitHub
The repository for the paper "Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection"
☆10Jul 5, 2022Updated 4 years ago
jiahuei / sparse-image-captioning
View on GitHub
Image captioning with weight pruning in PyTorch
☆22Jan 14, 2022Updated 4 years ago
bearcatt / LaBERT
View on GitHub
A length-controllable and non-autoregressive image captioning model.
☆69Jun 10, 2021Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ai-systems / nli4ct
View on GitHub
☆13Apr 21, 2024Updated 2 years ago
due-benchmark / du-schema
View on GitHub
JSON Schema format for storing datasets details, documents processed contents, and documents annotations in the document understanding do…
☆14Nov 5, 2024Updated last year
Nan2020 / PID-Pavement-Image-Dataset
View on GitHub
☆11Feb 18, 2020Updated 6 years ago
gsoykan / comics_text_plus
View on GitHub
Official repository of the paper: "A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition"
☆26Jul 10, 2023Updated 3 years ago
ych133 / How2R-and-How2QA
View on GitHub
A video retrieval dataset How2R and a video QA dataset How2QA
☆24Oct 15, 2020Updated 5 years ago
expertailab / ISAAQ
View on GitHub
☆10Oct 1, 2020Updated 5 years ago
VSJMilewski / multimodal-probes
View on GitHub
Code base for paper "Finding Structural Knowledge in Multimodal-BERT". Framework for probing and code for creating Scene Trees.
☆10May 19, 2022Updated 4 years ago