ACL 2025: Synthetic data generation pipelines for text-rich images.
☆160Mar 1, 2025Updated last year
Alternatives and similar repositories for pixmo-docs
Users that are interested in pixmo-docs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…☆21Dec 4, 2024Updated last year
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- ☆160May 8, 2025Updated 11 months ago
- A dataset of scientific vector graphics in TikZ for training generative models.☆26Feb 4, 2026Updated 2 months ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆55Oct 20, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format☆12Dec 7, 2019Updated 6 years ago
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆61Aug 25, 2025Updated 8 months ago
- -☆24Oct 25, 2022Updated 3 years ago
- [ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective☆204Nov 1, 2023Updated 2 years ago
- Repo for the paper: Towards Few-shot Entity Recognition in Document Images:A Label-aware Sequence-to-Sequence Framework☆14May 31, 2023Updated 2 years ago
- [ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆22Mar 29, 2025Updated last year
- [ICLR 2026] P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark☆51Jun 6, 2025Updated 10 months ago
- (ICCV 2023) ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer☆78Apr 9, 2024Updated 2 years ago
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆285Sep 26, 2025Updated 7 months ago
- ☆16May 15, 2025Updated 11 months ago
- Code for the Molmo Vision-Language Model☆902Dec 12, 2024Updated last year
- Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023☆110Oct 24, 2023Updated 2 years ago
- ☆38Oct 7, 2023Updated 2 years ago
- Transformer OCR is a Optical Character Recognition tookit built for researchers working on both OCR for both Vietnamese and English. This…☆10Dec 27, 2021Updated 4 years ago
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆49Jun 4, 2025Updated 10 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆954Mar 19, 2025Updated last year
- ☆43May 29, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…☆34Jun 21, 2022Updated 3 years ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆160Jul 28, 2025Updated 9 months ago
- Control LLM☆23Apr 6, 2025Updated last year
- Testbed for multimodal retrieval augmented generation techniques with FiftyOne, LlamaIndex, and Milvus☆21Aug 9, 2024Updated last year
- An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…☆53Jan 9, 2024Updated 2 years ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Empowering Unified MLLM with Multi-granular Visual Generation☆131Jan 16, 2025Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆102Oct 23, 2024Updated last year
- ☆32Dec 18, 2025Updated 4 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Convert datasets from Hugging Face to FiftyOne for Visualization☆11Mar 15, 2024Updated 2 years ago
- [ICCV 2025] TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆40Nov 27, 2024Updated last year
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆126Nov 13, 2023Updated 2 years ago
- ☆70Jan 9, 2024Updated 2 years ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆115Feb 26, 2025Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆79Jul 1, 2025Updated 10 months ago