Yuliang-Liu / VimTSLinks
VimTS: A Unified Video and Image Text Spotter
☆77Updated 8 months ago
Alternatives and similar repositories for VimTS
Users that are interested in VimTS are comparing it to the libraries listed below
Sorting:
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated 11 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated last year
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆19Updated 4 months ago
- [arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?☆24Updated 2 months ago
- [ICCV2025] A Token-level Text Image Foundation Model for Document Understanding☆111Updated last week
- [ECCV 2024] Official repo for UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diff…☆231Updated 5 months ago
- [CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.☆67Updated 4 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆137Updated 6 months ago
- ☆15Updated last week
- [AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents☆88Updated 3 weeks ago
- ☆99Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- ☆73Updated last year
- Official Repo of Graphist☆124Updated last year
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆14Updated 8 months ago
- ☆34Updated 6 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆145Updated 2 weeks ago
- Codebase for the Recognize Anything Model (RAM)☆82Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆62Updated 9 months ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆164Updated last year
- Image Prompter for Gradio☆92Updated last year
- ☆15Updated last week
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆128Updated last year
- TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes☆68Updated 3 months ago
- ☆75Updated 4 months ago
- ☆20Updated 2 years ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆49Updated 5 months ago
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆35Updated 2 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆69Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆131Updated last week