joanrod / ocr-vqganLinks
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers
☆81Updated 2 years ago
Alternatives and similar repositories for ocr-vqgan
Users that are interested in ocr-vqgan are comparing it to the libraries listed below
Sorting:
- Official code implementation for our paper -- Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models.☆25Updated 2 years ago
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆132Updated last month
- [CVPR 2023 highlight] Towards Flexible Multi-modal Document Models☆57Updated last year
- BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild☆30Updated last year
- Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023☆46Updated 11 months ago
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆25Updated last year
- ☆80Updated 2 years ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆94Updated 3 weeks ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated 8 months ago
- ☆13Updated 4 months ago
- ☆93Updated 10 months ago
- ☆57Updated last year
- Dreambooth (LoRA) with well-organized code structure. Naive adaptation from 🤗Diffusers.☆12Updated 2 years ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated last year
- Evaluating GPT-4o's image generation and editing ability in OCR tasks.☆47Updated 2 months ago
- Diffusion-based markup-to-image generation☆81Updated 2 years ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆39Updated 9 months ago
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆55Updated 9 months ago
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 3 years ago
- Official PyTorch Implementation of "WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models" - ICDAR 2023☆80Updated 11 months ago
- ☆24Updated last year
- ☆27Updated 4 years ago
- ALIGN trained on COYO-dataset☆29Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆57Updated last month
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆90Updated last year
- ☆81Updated 2 months ago
- Cheng-Fu Yang*, Wan-Cyuan Fan*, Fu-En Yang, Yu-Chiang Frank Wang, "LayoutTransformer: Scene Layout Generation with Conceptual and Spatial…☆59Updated 3 years ago
- Text-To-Image Generation with Chinese Characters☆131Updated last year
- ☆20Updated 10 months ago
- Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)☆53Updated last year