ACL 2025: Synthetic data generation pipelines for text-rich images.
☆162Mar 1, 2025Updated last year
Alternatives and similar repositories for pixmo-docs
Users that are interested in pixmo-docs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…☆21Dec 4, 2024Updated last year
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- ☆162May 8, 2025Updated last year
- A dataset of scientific vector graphics in TikZ for training generative models.☆27Feb 4, 2026Updated 4 months ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆55Oct 20, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format☆12Dec 7, 2019Updated 6 years ago
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆61Aug 25, 2025Updated 9 months ago
- [ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective☆205Nov 1, 2023Updated 2 years ago
- Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework☆14May 31, 2023Updated 3 years ago
- (ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer☆78Apr 9, 2024Updated 2 years ago
- [ICLR 2026] P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark☆53Jun 6, 2025Updated last year
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆288Sep 26, 2025Updated 8 months ago
- ☆16May 15, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆54Dec 12, 2024Updated last year
- Code for the Molmo Vision-Language Model☆912Dec 12, 2024Updated last year
- Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023☆110Oct 24, 2023Updated 2 years ago
- ☆38Oct 7, 2023Updated 2 years ago
- Transformer OCR is a Optical Character Recognition tookit built for researchers working on both OCR for both Vietnamese and English. This…☆10Dec 27, 2021Updated 4 years ago
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆49Jun 4, 2025Updated last year
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆954Mar 19, 2025Updated last year
- Release for CHART annotation tools used for ICDAR CHART 2019 competition☆29Sep 15, 2023Updated 2 years ago
- The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…☆34Jun 21, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆160Jul 28, 2025Updated 10 months ago
- Control LLM☆23Apr 6, 2025Updated last year
- Testbed for multimodal retrieval augmented generation techniques with FiftyOne, LlamaIndex, and Milvus☆21Aug 9, 2024Updated last year
- An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…☆53Jan 9, 2024Updated 2 years ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆132Jan 16, 2025Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆103Oct 23, 2024Updated last year
- ☆33Dec 18, 2025Updated 5 months ago
- Convert datasets from Hugging Face to FiftyOne for Visualization☆11Mar 15, 2024Updated 2 years ago
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆126Nov 13, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆70Jan 9, 2024Updated 2 years ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆116Feb 26, 2025Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆80Jul 1, 2025Updated 11 months ago
- ☆27Dec 2, 2025Updated 6 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆17Dec 19, 2024Updated last year
- Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning☆260Sep 26, 2024Updated last year