Yuliang-Liu / VimTS
VimTS: A Unified Video and Image Text Spotter
☆72Updated last week
Related projects ⓘ
Alternatives and complementary repositories for VimTS
- ☆83Updated 10 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆60Updated 2 months ago
- ☆67Updated this week
- Official implementation of High Fidelity Scene Text Synthesis.☆36Updated this week
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆51Updated 3 weeks ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆127Updated 5 months ago
- UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models☆203Updated 4 months ago
- ☆31Updated this week
- A Training-free Iterative Framework for Long Story Visualization☆62Updated this week
- Official PyTorch Code and Models of "Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling", ICME 2024☆42Updated last month
- ☆74Updated 8 months ago
- ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting☆20Updated 3 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆68Updated 2 months ago
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆73Updated this week
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆39Updated 3 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated last week
- ☆88Updated 3 months ago
- Precision Search through Multi-Style Inputs☆54Updated 3 months ago
- ☆25Updated last year
- Codebase for the Recognize Anything Model (RAM)☆64Updated 11 months ago
- NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement☆33Updated 3 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated 3 weeks ago
- Datasets and Evaluation Scripts for CompHRDoc☆25Updated 7 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆42Updated 10 months ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆28Updated 2 months ago
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆17Updated 6 months ago
- GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models☆46Updated 4 months ago
- ☆22Updated 3 months ago
- Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)☆39Updated 5 months ago