LAION-AI / OCR-ensemble
☆38Updated last year
Related projects: ⓘ
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions☆38Updated 5 months ago
- ☆64Updated 11 months ago
- Load any clip model with a standardized interface☆21Updated 4 months ago
- A dashboard for exploring timm learning rate schedulers☆18Updated last year
- A Versatile Face Encoder for Zero-Shot Diffusion Model Personalization☆18Updated this week
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆33Updated 11 months ago
- WikiTableSet: A largest publicly available image-based table recognition dataset in three languages built from Wikipedia☆23Updated last year
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆28Updated last year
- ☆55Updated 3 months ago
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆15Updated last week
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆72Updated last year
- ☆56Updated 6 months ago
- ☆84Updated 8 months ago
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆29Updated 2 years ago
- JAX implementation ViT-VQGAN☆77Updated last year
- Un-*** 50 billions multimodality dataset☆24Updated 2 years ago
- ☆13Updated last year
- Aggregating embeddings over time☆31Updated last year
- Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.☆21Updated 9 months ago
- Video descriptions of research papers relating to foundation models and scaling☆29Updated last year
- A repository containing datasets and tools to train a watermark classifier.☆58Updated 2 years ago
- ☆24Updated this week
- LoRA fine-tuned Stable Diffusion Deployment☆31Updated last year
- Timm model explorer☆36Updated 5 months ago
- An unofficial Implementation of DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents☆32Updated last year
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated last year
- Towards Flexible Multi-modal Document Models [Inoue+, CVPR2023]☆55Updated last year
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆101Updated last year
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆68Updated last week
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated last year