joanrod / ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers
☆72Updated last year
Related projects: ⓘ
- Official implementation of High Fidelity Scene Text Synthesis.☆33Updated 3 weeks ago
- Towards Flexible Multi-modal Document Models [Inoue+, CVPR2023]☆55Updated last year
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆68Updated 11 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆58Updated last week
- ☆77Updated last year
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 3 years ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year
- Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023☆23Updated last year
- ☆21Updated last year
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆73Updated last month
- The official codes and datasets for Artistic Text Segmentation (ECCV 2024).☆16Updated 2 months ago
- ☆31Updated 3 months ago
- ☆34Updated last month
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆42Updated 9 months ago
- ☆58Updated 10 months ago
- ☆29Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆25Updated 2 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆85Updated 5 months ago
- This project provides a data set with bounding boxes, body poses, 3D face meshes & captions of people from our LAION-2.2B. Additionally i…☆13Updated 2 years ago
- ☆50Updated 2 years ago
- OpenCOLE: Towards Reproducible Automatic Graphic Design Generation [Inoue+, CVPRW2024 (GDUG)]☆41Updated last week
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆54Updated last year
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆51Updated last month
- ☆46Updated 10 months ago
- ALIGN trained on COYO-dataset☆28Updated 4 months ago
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆21Updated 9 months ago
- ☆24Updated 3 years ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆22Updated last month
- FuseCap: Large Language Model for Visual Data Fusion in Enriched Caption Generation☆47Updated 5 months ago
- The official code for “DeepEraser: Deep Iterative Context Mining for Generic Text Eraser”, TMM, 2024.☆27Updated 3 weeks ago