ACL 2025: Synthetic data generation pipelines for text-rich images.
☆162Mar 1, 2025Updated last year
Alternatives and similar repositories for pixmo-docs
Users that are interested in pixmo-docs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking f…☆21Dec 4, 2024Updated last year
- Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition☆28Aug 29, 2023Updated 2 years ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆26Feb 22, 2024Updated 2 years ago
- ☆161May 8, 2025Updated last year
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆55Oct 20, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format☆12Dec 7, 2019Updated 6 years ago
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆61Aug 25, 2025Updated 8 months ago
- [ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective☆205Nov 1, 2023Updated 2 years ago
- [ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆22Mar 29, 2025Updated last year
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆288Sep 26, 2025Updated 7 months ago
- ☆16May 15, 2025Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆54Dec 12, 2024Updated last year
- Code for the Molmo Vision-Language Model☆905Dec 12, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆38Oct 7, 2023Updated 2 years ago
- Transformer OCR is a Optical Character Recognition tookit built for researchers working on both OCR for both Vietnamese and English. This…☆10Dec 27, 2021Updated 4 years ago
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆49Jun 4, 2025Updated 11 months ago
- MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.☆953Mar 19, 2025Updated last year
- Release for CHART annotation tools used for ICDAR CHART 2019 competition☆29Sep 15, 2023Updated 2 years ago
- The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Norm…☆34Jun 21, 2022Updated 3 years ago
- ☆44May 29, 2025Updated 11 months ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆160Jul 28, 2025Updated 9 months ago
- Control LLM☆23Apr 6, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Testbed for multimodal retrieval augmented generation techniques with FiftyOne, LlamaIndex, and Milvus☆21Aug 9, 2024Updated last year
- An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Informat…☆53Jan 9, 2024Updated 2 years ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Empowering Unified MLLM with Multi-granular Visual Generation☆132Jan 16, 2025Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆103Oct 23, 2024Updated last year
- ☆32Dec 18, 2025Updated 5 months ago
- Convert datasets from Hugging Face to FiftyOne for Visualization☆11Mar 15, 2024Updated 2 years ago
- [ICCV 2025] TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆40Nov 27, 2024Updated last year
- Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)☆126Nov 13, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆70Jan 9, 2024Updated 2 years ago
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆115Feb 26, 2025Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆79Jul 1, 2025Updated 10 months ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆138Sep 28, 2025Updated 7 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆17Dec 19, 2024Updated last year
- ☆15Apr 14, 2025Updated last year