Yuliang-Liu / VimTS
VimTS: A Unified Video and Image Text Spotter
☆72Updated 3 months ago
Related projects: ⓘ
- ☆54Updated 3 weeks ago
- ☆78Updated 8 months ago
- UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models☆194Updated 2 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆58Updated last week
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆45Updated 4 months ago
- Official implementation of High Fidelity Scene Text Synthesis.☆33Updated 3 weeks ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆25Updated 2 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆36Updated 8 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆124Updated 3 months ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆27Updated 3 weeks ago
- ☆70Updated 6 months ago
- The official code of "RWKV-CLIP: A Robust Vision-Language Representation Learner"☆97Updated 2 months ago
- The codes of Siggraph Asia 2024 paper "Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation"☆25Updated 3 weeks ago
- Precision Search through Multi-Style Inputs☆45Updated last month
- Codebase for the Recognize Anything Model (RAM)☆58Updated 9 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆52Updated 5 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆138Updated last week
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆17Updated 4 months ago
- An open-source implementaion for fine-tuning Qwen2-VL-2B and Qwen2-VL-7B.☆33Updated this week
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆56Updated 2 months ago
- [ECCV2024] PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer☆45Updated last week
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆129Updated 4 months ago
- Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)