Yuliang-Liu/VimTS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Yuliang-Liu/VimTS)

Yuliang-Liu / VimTS

VimTS: A Unified Video and Image Text Spotter

☆79

Alternatives and similar repositories for VimTS

Users that are interested in VimTS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SCUT-DLVCLab / RFUND
View on GitHub
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…
☆21Dec 4, 2024Updated last year
mxin262 / Bridging-Text-Spotting
View on GitHub
(CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.
☆75Jun 11, 2024Updated 2 years ago
Yuliang-Liu / SPTSv2
View on GitHub
☆22May 30, 2023Updated 3 years ago
MAEHCM / ICL-D3IE
View on GitHub
Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”
☆54Aug 8, 2023Updated 2 years ago
mxin262 / ESTextSpotter
View on GitHub
(ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
☆78Apr 9, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Yuliang-Liu / Open-Oracle
View on GitHub
AI-assisted Deciphering Oracle Bone Script
☆87Jul 6, 2026Updated 2 weeks ago
Hxyz-123 / GoMatching
View on GitHub
[NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
☆34May 29, 2025Updated last year
SCUT-DLVCLab / OCR-Reasoning
View on GitHub
[ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
☆76May 26, 2026Updated 2 months ago
ViTAE-Transformer / SAMText
View on GitHub
The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"
☆16May 3, 2023Updated 3 years ago
weijiawu / TransDETR
View on GitHub
[IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer
☆114Mar 28, 2024Updated 2 years ago
shannanyinxiang / UPOCR
View on GitHub
Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)
☆69Jun 6, 2024Updated 2 years ago
zzyhlyoko / DCTC
View on GitHub
☆42Sep 2, 2023Updated 2 years ago
Wei-ucas / TPSNet
View on GitHub
☆28Nov 29, 2023Updated 2 years ago
ayumiymk / DiG
View on GitHub
Official PyTorch implementation of `Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition`
☆74Feb 27, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
shannanyinxiang / SPTS
View on GitHub
Official implementation of SPTS: Single-Point Text Spotting (ACM MM 2022 Oral)
☆145Jul 26, 2023Updated 3 years ago
Yuliang-Liu / MultimodalOCR
View on GitHub
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
☆873Updated this week
shi-yx / URaG
View on GitHub
Official implementation of URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding (AAAI 2026…
☆43Feb 4, 2026Updated 5 months ago
shannanyinxiang / PageNet
View on GitHub
Official implementation of PageNet (IJCV 2022)
☆82Oct 31, 2022Updated 3 years ago
weijiawu / BOVText-Benchmark
View on GitHub
[NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting
☆71Oct 9, 2023Updated 2 years ago
wenwenyu / TCM
View on GitHub
Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)
☆202Jun 17, 2024Updated 2 years ago
HCIILAB / LAST
View on GitHub
Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition
☆28Aug 29, 2023Updated 2 years ago
DrLuo / SemiETS
View on GitHub
【CVPR 2025】SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
☆17Jul 1, 2025Updated last year
MelosY / CAM
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Yuliang-Liu / Monkey
View on GitHub
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,950Jun 2, 2026Updated last month
bytedance / SPTSv2
View on GitHub
The official implementation of SPTS v2: Single-Point Text Spotting
☆138Jun 29, 2023Updated 3 years ago
yeungchenwa / HDR
View on GitHub
[AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents
☆111Jun 28, 2026Updated 3 weeks ago
bytedance / VTVQA
View on GitHub
Towards Video Text Visual Question Answering: Benchmark and Baseline
☆41Feb 26, 2024Updated 2 years ago
xdxie / WordArt
View on GitHub
The official code of CornerTransformer (ECCV 2022, Oral) on top of MMOCR.
☆148Mar 6, 2023Updated 3 years ago
google-research-datasets / hiertext
View on GitHub
The HierText dataset contains ~12k images from the Open Images dataset v6 with large amount of text entities. We provide word, line and p…
☆316Dec 2, 2024Updated last year
mustache-dev / SplineCamera
View on GitHub
a SplineCamera react component
☆14Feb 18, 2024Updated 2 years ago
Yuliang-Liu / bezier_curve_text_spotting
View on GitHub
A PyTorch implementation of "ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network" (CVPR 2020 oral)
☆433Apr 28, 2022Updated 4 years ago
Y-ichen / FlexiFilm
View on GitHub
FlexiFilm: Long Video Generation with Flexible Conditions
☆31May 1, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
harrytea / TGDoc
View on GitHub
"Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023
☆16Nov 28, 2024Updated last year
Yuliang-Liu / TIoU-metric
View on GitHub
Tightness-aware Evaluation Protocol for Scene Text Detection (CVPR 2019)
☆213Oct 25, 2019Updated 6 years ago
MiliLab / LogicOCR
View on GitHub
[arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
☆35Dec 1, 2025Updated 7 months ago
mlpc-ucsd / TESTR
View on GitHub
(CVPR 2022) Text Spotting Transformers
☆192Jan 30, 2023Updated 3 years ago
Canjie-Luo / Real-300K
View on GitHub
The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…
☆34Jun 21, 2022Updated 4 years ago
rynmurdock / inanimate
View on GitHub
Generate images from an initial frame and text
☆37Jul 28, 2023Updated 2 years ago
SCUT-DLVCLab / Document-AI-Recommendations
View on GitHub
Algorithms, papers, datasets, performance comparisons for Document AI.
☆209Mar 1, 2025Updated last year