joanrod / ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers
☆76Updated 2 years ago
Alternatives and similar repositories for ocr-vqgan:
Users that are interested in ocr-vqgan are comparing it to the libraries listed below
- Towards Flexible Multi-modal Document Models [Inoue+, CVPR2023]☆56Updated last year
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆23Updated last year
- The official codes and datasets for Artistic Text Segmentation (ECCV 2024).☆23Updated 3 months ago
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆126Updated 3 months ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆85Updated this week
- Official implementation of High Fidelity Scene Text Synthesis.☆46Updated last month
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆31Updated 5 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆63Updated 4 months ago
- Source code of the TextLap model, a LLM for text-2-layout generation.☆13Updated 3 months ago
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆22Updated last year
- Official code implementation for our paper -- Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models.☆25Updated 2 years ago
- ☆89Updated 5 months ago
- [MM2023] An official implement of the paper "One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer"☆16Updated last year
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 3 years ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆88Updated 10 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆57Updated last year
- ALIGN trained on COYO-dataset☆29Updated 9 months ago
- Source code for paper: "AltDiffusion: A multilingual Text-to-Image diffusion model"☆37Updated 11 months ago
- Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`☆17Updated last year
- DoodleFormer: Creative Sketch Drawing with Transformers (ECCV22)☆25Updated 2 years ago
- Finetuning CLIP on a small image/text dataset using huggingface libs☆44Updated 2 years ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆32Updated 2 months ago
- ☆22Updated 11 months ago
- Continuous diffusion for layout generation☆37Updated 9 months ago
- ☆90Updated last year
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated 2 years ago
- ☆79Updated last year
- Official PyTorch Implementation of "DiffusionPen: Towards Controlling the Style of Handwritten Text Generation" - ECCV 2024☆34Updated 3 months ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆83Updated 6 months ago