joanrod / ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers
☆81Updated 2 years ago
Alternatives and similar repositories for ocr-vqgan:
Users that are interested in ocr-vqgan are comparing it to the libraries listed below
- [CVPR 2023 highlight] Towards Flexible Multi-modal Document Models☆56Updated last year
- This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).☆131Updated 3 weeks ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆93Updated 3 months ago
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆23Updated last year
- ☆80Updated 2 years ago
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 3 years ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆58Updated last year
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆85Updated 3 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Updated 8 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆56Updated 3 weeks ago
- ☆13Updated 3 months ago
- Diffusion-based markup-to-image generation☆81Updated 2 years ago
- Official code implementation for our paper -- Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models.☆25Updated 2 years ago
- ☆93Updated 9 months ago
- Continuous diffusion for layout generation☆42Updated 2 months ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆35Updated 8 months ago
- Official PyTorch Implementation of "DiffusionPen: Towards Controlling the Style of Handwritten Text Generation" - ECCV 2024☆47Updated 6 months ago
- ☆133Updated last year
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated last year
- BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild☆30Updated last year
- Evaluating GPT-4o's image generation and editing ability in OCR tasks.☆43Updated last month
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated 2 years ago
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆56Updated last month
- Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`☆17Updated last year
- Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023☆45Updated 10 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- ☆51Updated last year
- ☆82Updated last month
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆36Updated 5 months ago
- ☆14Updated last year