LAION-AI / OCR-ensemble
☆41Updated last year
Alternatives and similar repositories for OCR-ensemble
Users that are interested in OCR-ensemble are comparing it to the libraries listed below
Sorting:
- Load any clip model with a standardized interface☆21Updated last year
- A dashboard for exploring timm learning rate schedulers☆19Updated 5 months ago
- Un-*** 50 billions multimodality dataset☆24Updated 2 years ago
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆22Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 9 months ago
- LoRA fine-tuned Stable Diffusion Deployment☆31Updated 2 years ago
- ALIGN trained on COYO-dataset☆29Updated last year
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆32Updated 2 years ago
- ☆64Updated last year
- Using open-source LLM Llama2 by Meta on local CPU inference for document question-and-answer☆15Updated last year
- A Versatile Face Encoder for Zero-Shot Diffusion Model Personalization☆24Updated this week
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆35Updated 2 years ago
- ☆58Updated last year
- JAX implementation ViT-VQGAN☆83Updated 2 years ago
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆55Updated 9 months ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆36Updated last year
- Explorations into adversarial losses on top of autoregressive loss for language modeling☆36Updated 2 months ago
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated 2 years ago
- Aggregating embeddings over time☆31Updated 2 years ago
- [CVPR 2023 highlight] Towards Flexible Multi-modal Document Models☆56Updated last year
- Official implementation of "Active Image Indexing"☆59Updated 2 years ago
- Tools for content datamining and NLP at scale☆43Updated 10 months ago
- Official Training and Inference Code of Amodal Expander, Proposed in Tracking Any Object Amodally☆18Updated 10 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆14Updated 5 months ago
- An open source implementation of CLIP.☆32Updated 2 years ago
- EdgeSAM model for use with Autodistill.☆26Updated 11 months ago
- Cross-lingual learning in scene text recognition (ICASSP2024)☆16Updated 7 months ago
- SuperStyleNet: Deep Image Synthesis with Superpixel Based Style Encoder (BMVC 2021)☆27Updated 3 years ago
- A reimplementation of KOSMOS-1 from "Language Is Not All You Need: Aligning Perception with Language Models"☆27Updated 2 years ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated last year