joanrod / ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers
☆74Updated last year
Related projects ⓘ
Alternatives and complementary repositories for ocr-vqgan
- Towards Flexible Multi-modal Document Models [Inoue+, CVPR2023]☆56Updated last year
- Official implementation of High Fidelity Scene Text Synthesis.☆36Updated last week
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆23Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆61Updated 2 months ago
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆121Updated last month
- Official code implementation for our paper -- Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models.☆25Updated 2 years ago
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆51Updated 3 months ago
- ☆88Updated 3 months ago
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 3 years ago
- ☆79Updated last year
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆73Updated last year
- ☆21Updated last year
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated last year
- ☆13Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆43Updated 11 months ago
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers☆21Updated 2 years ago
- The official codes and datasets for Artistic Text Segmentation (ECCV 2024).☆19Updated last month
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆28Updated 2 months ago
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆21Updated 11 months ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆29Updated this week
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆62Updated 4 months ago
- FuseCap: Large Language Model for Visual Data Fusion in Enriched Caption Generation☆49Updated 7 months ago
- A huge dataset for Document Visual Question Answering☆14Updated 3 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 5 months ago
- [MM2023] An official implement of the paper "One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer"☆15Updated last year
- Diffusion-based markup-to-image generation☆78Updated last year
- Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)☆76Updated 6 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 3 weeks ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆32Updated 8 months ago
- Training code for CLIP-FlanT5☆19Updated 3 months ago