joanrod / ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers
☆81Updated 2 years ago
Alternatives and similar repositories for ocr-vqgan:
Users that are interested in ocr-vqgan are comparing it to the libraries listed below
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆23Updated last year
- [CVPR 2023 highlight] Towards Flexible Multi-modal Document Models☆56Updated last year
- ☆80Updated 2 years ago
- BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild☆29Updated last year
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆128Updated this week
- ☆13Updated 3 months ago
- ☆56Updated last year
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated 2 years ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆35Updated 7 months ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆92Updated 2 months ago
- ☆92Updated 8 months ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆96Updated 2 years ago
- Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023☆45Updated 10 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆65Updated 7 months ago
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 3 years ago
- A huge dataset for Document Visual Question Answering☆16Updated 8 months ago
- Evaluating GPT-4o's image generation and editing ability in OCR tasks.☆39Updated last week
- ALIGN trained on COYO-dataset☆29Updated 11 months ago
- Official code implementation for our paper -- Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models.☆25Updated 2 years ago
- Official implementation of ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining (AAAI 20…☆49Updated 9 months ago
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆56Updated 3 weeks ago
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆22Updated last year
- (CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.☆60Updated 10 months ago
- Official PyTorch Implementation of "WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models" - ICDAR 2023☆80Updated 9 months ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆83Updated 2 months ago
- Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)☆51Updated 10 months ago
- PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (ACMMM 2024)☆39Updated 4 months ago
- ☆80Updated last month
- The official codes and datasets for Artistic Text Segmentation (ECCV 2024).☆25Updated 6 months ago
- Text Image Inpainting via Global Structure-Guided Diffusion Models (Accepted by AAAI-24)☆65Updated 2 weeks ago