adlnlp/mmvqa

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/adlnlp/mmvqa)

adlnlp / mmvqa

☆19

Alternatives and similar repositories for mmvqa

Users that are interested in mmvqa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SCUT-DLVCLab / RFUND
View on GitHub
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…
☆21Dec 4, 2024Updated last year
lcy0604 / QT-TextSR
View on GitHub
This repository is the implementation of "QT-TextSR: Enhancing scene text image super-resolution via efficient interaction with text reco…
☆20Jul 9, 2025Updated last year
TenMilesLotus / DTSM
View on GitHub
Code and data for the paper: DTSM: Toward Dense Table Structure Recognition with Text Query Encoder and Adjacent Feature Aggregator
☆13Apr 28, 2024Updated 2 years ago
lcy0604 / CTRNet-plus
View on GitHub
The official implement of CTRNet++.
☆15Dec 30, 2024Updated last year
HCIILAB / M5HisDoc
View on GitHub
☆34Dec 18, 2025Updated 7 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
whlscut / DocLayLLM
View on GitHub
[CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
☆30Dec 18, 2025Updated 7 months ago
furkanbiten / idl_data
View on GitHub
OCR Annotations from Amazon Textract for Industry Documents Library
☆103Aug 20, 2022Updated 3 years ago
mxin262 / ESTextSpotter
View on GitHub
(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
☆78Apr 9, 2024Updated 2 years ago
Canjie-Luo / Real-300K
View on GitHub
The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…
☆34Jun 21, 2022Updated 4 years ago
PanguIR / MRAGSurvey
View on GitHub
A Survey of Multimodal Retrieval-Augmented Generation
☆20Nov 3, 2025Updated 8 months ago
shi-yx / URaG
View on GitHub
Official implementation of URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding (AAAI 2026…
☆43Feb 4, 2026Updated 5 months ago
SCUT-DLVCLab / OCR-Reasoning
View on GitHub
[ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
☆76May 26, 2026Updated last month
ZeningLin / ViBERTgrid-PyTorch
View on GitHub
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…
☆53Jan 9, 2024Updated 2 years ago
SCUT-DLVCLab / GPT-4V_OCR
View on GitHub
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
☆128Nov 13, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ZZZHANG-jx / DocKylin
View on GitHub
[AAAI 2025] DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
☆36Jun 1, 2025Updated last year
Mountchicken / CTPN_CRNN_ChineseOCR_PyQt5
View on GitHub
CTPN and CRNN based Chinese OCR, developed with PyQt5
☆22Sep 18, 2021Updated 4 years ago
HCIILAB / LAST
View on GitHub
Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition
☆28Aug 29, 2023Updated 2 years ago
shengfly / writer-identification
View on GitHub
☆11Jun 3, 2025Updated last year
lcy0604 / CTRNet
View on GitHub
This repository is the implementation of "Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Contex…
☆97Feb 21, 2023Updated 3 years ago
Bureau-du-Forestier-en-chef / FMT
View on GitHub
Forest Management Tool a C++ library for forest planning.
☆17Updated this week
jfkuang / CFAM
View on GitHub
Contrast-guided Feature Adjustment Module for Visual Information Extraction
☆30May 23, 2023Updated 3 years ago
SCUT-DLVCLab / TongGu-LLM
View on GitHub
[EMNLP 2024] TongGu, a classical Chinese language model.
☆69Sep 28, 2024Updated last year
allenai / clarifydelphi
View on GitHub
☆13Apr 24, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
HCIILAB / SCUT-EnsText
View on GitHub
☆69Apr 18, 2024Updated 2 years ago
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
kyxscut / CG-GAN
View on GitHub
Official PyTorch implementation of the CVPR 2022 paper: "Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Di…
☆94Sep 17, 2022Updated 3 years ago
wangyuxin87 / PERT
View on GitHub
PERT: A Progressively Region-based Network for Scene Text Removal (TIP2023)
☆37Aug 11, 2023Updated 2 years ago
susumuota / nano-askllm
View on GitHub
Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.
☆12Jun 19, 2024Updated 2 years ago
H-Freax / Awesome-Graph-RAG
View on GitHub
This repository compiles a list of papers/resources related to the graph retrieval-augmented generation! Star⭐ the repo and follow me if …
☆10Dec 7, 2024Updated last year
thejonaslab / vonmises-icml-2023
View on GitHub
☆10Jun 24, 2023Updated 3 years ago
RiskModellingResearch / DeepLearning_Autumn22
View on GitHub
☆21Feb 1, 2023Updated 3 years ago
1hunters / EdgeViT
View on GitHub
This is an unofficial PyTorch implementation of EdgeViT in "EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transform…
☆21May 21, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lcy0604 / EraseNet
View on GitHub
☆156Jul 7, 2022Updated 4 years ago
zaixizhang / MolCode
View on GitHub
Chemical Science 2023: An equivariant generative framework for molecular graph-structure Co-design
☆10Jun 18, 2023Updated 3 years ago
ictnlp / GMA
View on GitHub
Code for ACL 2022 findings paper "Gaussian Multi-head Attention for Simultaneous Machine Translation"
☆11Mar 31, 2022Updated 4 years ago
althayr / Document-Layout-Parser
View on GitHub
Parses a document (scanned or phone captured) and returns the underlying question - answer layout structured capture by LayoutXLM model
☆10Jun 14, 2021Updated 5 years ago
dmg-illc / uid-dialogue
View on GitHub
A repository for the EMNLP 2021 paper "Is Information Density Uniform in Task-Oriented Dialogues?" and for the CoNLL 2021 paper "Analysin…
☆10Jun 17, 2024Updated 2 years ago
declare-lab / safety-arithmetic
View on GitHub
☆13Jan 14, 2025Updated last year
yeungchenwa / HDR
View on GitHub
[AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents
☆111Jun 28, 2026Updated 3 weeks ago