Yuliang-Liu / VimTSLinks
VimTS: A Unified Video and Image Text Spotter
☆79Updated last year
Alternatives and similar repositories for VimTS
Users that are interested in VimTS are comparing it to the libraries listed below
Sorting:
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated last year
- Official Repo of Graphist☆128Updated last year
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆22Updated 8 months ago
- [AAAI2025 Oral] Predicting the Original Appearance of Damaged Historical Documents☆93Updated 4 months ago
- [CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.☆78Updated 8 months ago
- [ECCV 2024] Official repo for UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diff…☆233Updated 9 months ago
- ☆29Updated last year
- ☆99Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆28Updated last year
- [ICCV2025] A Token-level Text Image Foundation Model for Document Understanding☆126Updated 3 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆143Updated 10 months ago
- ☆17Updated 4 months ago
- [arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?☆34Updated last week
- ☆46Updated 10 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Updated last year
- InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions☆132Updated last year
- ☆32Updated 2 years ago
- ☆93Updated 9 months ago
- ☆99Updated 11 months ago
- ☆75Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆64Updated last year
- Image Prompter for Gradio☆92Updated last year
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆50Updated 9 months ago
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆144Updated last year
- The SAIL-VL2 series model developed by the BytedanceDouyinContent Group☆75Updated 2 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Updated last month
- ☆57Updated last year
- ☆99Updated last year
- ☆35Updated 10 months ago