Yuliang-Liu / VimTS
VimTS: A Unified Video and Image Text Spotter
☆77Updated 6 months ago
Alternatives and similar repositories for VimTS:
Users that are interested in VimTS are comparing it to the libraries listed below
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated 8 months ago
- Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)☆51Updated 11 months ago
- [CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.☆61Updated last month
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 9 months ago
- ☆73Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 6 months ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆35Updated 8 months ago
- ☆28Updated 3 months ago
- [AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents☆77Updated last month
- ☆95Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆50Updated 4 months ago
- Codebase for the Recognize Anything Model (RAM)☆78Updated last year
- ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting☆36Updated last month
- ☆29Updated 8 months ago
- (CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.☆62Updated 11 months ago
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆82Updated 7 months ago
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆13Updated last year
- The official repository for the RealSyn dataset☆32Updated last week
- [ECCV 2024] Official repo for UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diff…☆224Updated 2 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- ☆22Updated 4 months ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆126Updated 2 months ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆156Updated 7 months ago
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆23Updated 4 months ago
- ☆32Updated 3 months ago
- Official repo for: SuperEdit - Rectifying and Facilitating Supervision for Instruction-Based Image Editing☆59Updated this week
- ☆88Updated 4 months ago
- ☆56Updated last year
- Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.☆15Updated last year