LAION-AI / OCR-ensembleLinks
☆42Updated 2 years ago
Alternatives and similar repositories for OCR-ensemble
Users that are interested in OCR-ensemble are comparing it to the libraries listed below
Sorting:
- Official implementation of "Active Image Indexing"☆60Updated 2 years ago
- ☆87Updated 2 years ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆37Updated 2 years ago
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆82Updated 3 years ago
- A dashboard for exploring timm learning rate schedulers☆19Updated last year
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆22Updated 2 years ago
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆103Updated 2 years ago
- ☆65Updated 2 years ago
- Little article showing how to load pytorch's models with linear memory consumption☆34Updated 3 years ago
- Timm model explorer☆42Updated last year
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆35Updated 2 years ago
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆56Updated last year
- ViT trained on COYO-Labeled-300M dataset☆33Updated 3 years ago
- 1st Place Solution in Google Universal Image Embedding☆67Updated 2 years ago
- Load any clip model with a standardized interface☆22Updated 3 months ago
- ☆59Updated last year
- Simplify Your Visual Data Ops. Find and visualize issues with your computer vision datasets such as duplicates, anomalies, data leakage, …☆69Updated 8 months ago
- Implementation for the CVPR 2023 paper "Improving Selective Visual Question Answering by Learning from Your Peers" (https://arxiv.org/abs…☆25Updated 2 years ago
- Render documents on a virtual paper with folds and other types of damage using blender geometry nodes.☆26Updated 2 years ago
- The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'☆102Updated 8 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆23Updated last year
- Efficiently read embedding in streaming from any filesystem☆104Updated 5 months ago
- JAX implementation ViT-VQGAN☆82Updated 3 years ago
- A Versatile Face Encoder for Zero-Shot Diffusion Model Personalization☆24Updated 6 months ago
- An open source implementation of CLIP.☆33Updated 3 years ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆160Updated last year
- A repository containing datasets and tools to train a watermark classifier.☆74Updated 3 years ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆146Updated 2 weeks ago
- Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"☆102Updated last year
- Official repository for the paper "End-to-End Visual Editing with a Generatively Pre-Trained Artist", which is accepted at ECCV 2022. Her…☆29Updated 3 years ago