ChenyuGAO-CS/SMA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ChenyuGAO-CS/SMA)

ChenyuGAO-CS / SMA

The imdb files with SBD-Trans OCR for TextVQA dataset.

☆11

Alternatives and similar repositories for SMA

Users that are interested in SMA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ronghanghu / vqa-maskrcnn-benchmark-m4c
View on GitHub
Used in M4C feature extraction script: https://github.com/facebookresearch/mmf/blob/project/m4c/projects/M4C/scripts/extract_ocr_frcn_fea…
☆13Jan 30, 2020Updated 6 years ago
ZephyrZhuQi / ssbaseline
View on GitHub
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]
☆57Apr 5, 2022Updated 4 years ago
microsoft / TAP
View on GitHub
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
☆72May 22, 2023Updated 3 years ago
ronghanghu / mmf
View on GitHub
A modular framework for Visual Question Answering research by the FAIR A-STAR team
☆45Aug 26, 2021Updated 4 years ago
uakarsh / latr
View on GitHub
Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…
☆56Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
HenryJunW / TAG
View on GitHub
☆22Dec 8, 2022Updated 3 years ago
furkanbiten / stvqa_amazon_ocr
View on GitHub
STVQA and TextVQA OCR results from Amazon Text in Image pipeline
☆12Jul 18, 2022Updated 4 years ago
xiaojino / RUArt
View on GitHub
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
☆10Nov 27, 2022Updated 3 years ago
xinke-wang / Awesome-Text-VQA
View on GitHub
☆188May 8, 2024Updated 2 years ago
wzk1015 / CNMT
View on GitHub
[AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
☆24Mar 29, 2023Updated 3 years ago
yashkant / sam-textvqa
View on GitHub
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
☆65Sep 15, 2021Updated 4 years ago
HAWLYQ / Qc-TextCap
View on GitHub
☆16Dec 25, 2021Updated 4 years ago
malaysia-ai / dataset
View on GitHub
Recipes to prepare datasets!
☆15Jun 28, 2026Updated 3 weeks ago
guanghuixu / AnchorCaptioner
View on GitHub
☆30May 7, 2021Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Gitsamshi / WeakVRD-Captioning
View on GitHub
Implementation of paper "Improving Image Captioning with Better Use of Caption"
☆33Sep 15, 2020Updated 5 years ago
aurooj / WSG-VQA-VLTransformers
View on GitHub
Weakly Supervised Grounding for VQA in Vision-Language Transformers
☆17May 6, 2023Updated 3 years ago
nttmdlab-nlp / VisualMRC
View on GitHub
VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
☆57Mar 31, 2025Updated last year
lucasnfe / puct-music-emotion
View on GitHub
☆15Nov 28, 2022Updated 3 years ago
matteoferrante / semantic-brain-decoding
View on GitHub
☆12Jan 27, 2023Updated 3 years ago
Karhdo / IS207.M12.HTCL
View on GitHub
Phát triển ứng dụng web
☆13Jan 7, 2022Updated 4 years ago
usydnlp / vdoc
View on GitHub
☆15Sep 7, 2022Updated 3 years ago
guanghuixu / CRN_tvqa
View on GitHub
☆15Oct 27, 2020Updated 5 years ago
taolusi / SECURE
View on GitHub
ACL'2024-Main: Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Languag…
☆12Sep 19, 2025Updated 10 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
zhaojw1998 / Query-and-reArrange
View on GitHub
Code and demo for paper: Zhao et al., "Q&A: Query-Based Representation Learning for Multi-Track Symbolic Music re-Arrangement," IJCAI 202…
☆21May 2, 2024Updated 2 years ago
gnovack / distributed-training-and-deepspeed
View on GitHub
☆17Jun 19, 2023Updated 3 years ago
ai-systems / nli4ct
View on GitHub
☆13Apr 21, 2024Updated 2 years ago
linjieyangsc / densecap
View on GitHub
Dense captioning with joint inference and visual context
☆52Dec 25, 2018Updated 7 years ago
Actasidiot / EFIFSTR
View on GitHub
[ACM MM 2020] Exploring Font-independent Features for Scene Text Recognition
☆44Nov 30, 2020Updated 5 years ago
ezeli / BUTD_model
View on GitHub
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
☆48Nov 15, 2021Updated 4 years ago
yuranusduke / CMT-Convolutional-NN-Meets-ViT
View on GitHub
Pytorch unofficial implementation of CMT
☆13Jul 16, 2021Updated 5 years ago
ChineseYjh / DoFace
View on GitHub
A package that makes Virtual Makeup easy.
☆19Jun 24, 2021Updated 5 years ago
marcopede / AreasOfAttention
View on GitHub
☆10Apr 20, 2018Updated 8 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Susheel-1999 / Sentence_Similarity
View on GitHub
Package to calculate the similarity score between two sentences
☆11Jun 26, 2023Updated 3 years ago
doubleZ0108 / Computer-Vision-PKU
View on GitHub
Computer Vision(04711432) | Peking Univ. ECE Course Materials
☆16Aug 15, 2022Updated 3 years ago
lvjianjin / TextRecognitionDataGenerator
View on GitHub
一个生成crnn训练数据集的工具，主要针对简体中文。
☆15Apr 19, 2022Updated 4 years ago
AndresPMD / StacMR
View on GitHub
Scene Text Aware Cross Modal Retrieval (StacMR)
☆24Sep 3, 2021Updated 4 years ago
trungfinity / visc
View on GitHub
Vietnamese spelling correction (ViSC) tool
☆12Dec 11, 2016Updated 9 years ago
bytedance / midi_melody_extraction
View on GitHub
☆23Sep 27, 2023Updated 2 years ago
AbdullahHendy / live-translation
View on GitHub
Real-time speech-to-text translation over WebSocket. Streams Opus or raw PCM audio from client to server for live transcription and optio…
☆16May 30, 2026Updated last month