ACL 2025: Synthetic data generation pipelines for text-rich images.
☆158Mar 1, 2025Updated last year
Alternatives and similar repositories for pixmo-docs
Users that are interested in pixmo-docs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…☆21Dec 4, 2024Updated last year
- Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition☆28Aug 29, 2023Updated 2 years ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- ☆159May 8, 2025Updated 11 months ago
- A dataset of scientific vector graphics in TikZ for training generative models.☆26Feb 4, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆54Oct 20, 2024Updated last year
- DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format☆12Dec 7, 2019Updated 6 years ago
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆60Aug 25, 2025Updated 7 months ago
- -☆24Oct 25, 2022Updated 3 years ago
- [ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective☆204Nov 1, 2023Updated 2 years ago
- Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework☆14May 31, 2023Updated 2 years ago
- [ICLR 2026] P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark☆50Jun 6, 2025Updated 10 months ago
- (ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer☆78Apr 9, 2024Updated 2 years ago
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆285Sep 26, 2025Updated 6 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆53Dec 12, 2024Updated last year
- Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023☆110Oct 24, 2023Updated 2 years ago
- Code for the Molmo Vision-Language Model☆896Dec 12, 2024Updated last year
- Transformer OCR is a Optical Character Recognition tookit built for researchers working on both OCR for both Vietnamese and English. This…☆10Dec 27, 2021Updated 4 years ago
- ☆36Oct 7, 2023Updated 2 years ago
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆49Jun 4, 2025Updated 10 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆954Mar 19, 2025Updated last year
- Release for CHART annotation tools used for ICDAR CHART 2019 competition☆28Sep 15, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…☆34Jun 21, 2022Updated 3 years ago
- ☆43May 29, 2025Updated 10 months ago
- ☆32Dec 18, 2025Updated 3 months ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆159Jul 28, 2025Updated 8 months ago
- Control LLM☆22Apr 6, 2025Updated last year
- An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…☆53Jan 9, 2024Updated 2 years ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Empowering Unified MLLM with Multi-granular Visual Generation☆130Jan 16, 2025Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆102Oct 23, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆14Apr 14, 2025Updated 11 months ago
- [ICCV 2025] TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆39Nov 27, 2024Updated last year
- ☆70Jan 9, 2024Updated 2 years ago
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆126Nov 13, 2023Updated 2 years ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆113Feb 26, 2025Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆79Jul 1, 2025Updated 9 months ago