allenai / pixmo-docsLinks
ACL 2025: Synthetic data generation pipelines for text-rich images.
☆137Updated 7 months ago
Alternatives and similar repositories for pixmo-docs
Users that are interested in pixmo-docs are comparing it to the libraries listed below
Sorting:
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆172Updated 6 months ago
- ☆74Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆91Updated last year
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆55Updated last month
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆90Updated 11 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆52Updated 10 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆326Updated last month
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement