DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems
☆68Sep 29, 2024Updated last year
Alternatives and similar repositories for DocBench
Users that are interested in DocBench are comparing it to the libraries listed below
Sorting:
- TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning☆23Sep 17, 2024Updated last year
- Task Compass: Scaling Multi-task Pre-training with Task Prefix (EMNLP 2022: Findings) (stay tuned & more will be updated)☆22Oct 17, 2022Updated 3 years ago
- [ICME 2023] FlowText: Synthesizing Realistic Scene Text Video with Optical Flow Estimation☆13May 13, 2023Updated 2 years ago
- Searching a High Performance Feature Extractor for Text Recognition Network. TPAMI 2022☆13Nov 25, 2022Updated 3 years ago
- Cross-lingual learning in scene text recognition (ICASSP2024)☆18Sep 29, 2024Updated last year
- This project aims to generate syntactichandwritten mathematical expression. The dataset is generated from the CROHME 2014 training set.☆14Feb 24, 2022Updated 4 years ago
- Intuitive interface for fine-tuning and retraining a Tesseract OCR language model☆10Jul 4, 2025Updated 8 months ago
- Create handwritten word embeddings from a text recognition Seq2Seq system.☆11Dec 1, 2022Updated 3 years ago
- ☆17Jul 9, 2024Updated last year
- ☆17Jun 12, 2024Updated last year
- SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)☆105Mar 31, 2025Updated 11 months ago
- Basic HTR concepts/modules to boost performance☆39Nov 30, 2024Updated last year
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆124Sep 28, 2025Updated 5 months ago
- The source codes of TDv2 in paper: TDv2: A Novel Tree-Structured Decoder for Offline Mathematical Expression Recognition.☆12Jul 28, 2022Updated 3 years ago
- Official PyTorch Implementation of "Rethinking HTG Evaluation: Bridging Generation and Recognition" (Oral) - 1st Workshop on Critical Eva…☆17Sep 23, 2024Updated last year
- ☆69Jan 9, 2024Updated 2 years ago
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers☆21Jul 26, 2022Updated 3 years ago
- RoDLA: Benchmarking the Robustness of Document Layout Analysis Models☆39Mar 26, 2025Updated 11 months ago
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- PyTorch implementation of STR models for transfer learning in Indic Languages☆16Sep 20, 2021Updated 4 years ago
- This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135☆19Mar 7, 2023Updated 2 years ago
- [COLM'24] "Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning"☆21Jun 14, 2024Updated last year
- A function that takes as input a cropped text line image, and outputs the dewarped image.☆21Sep 2, 2025Updated 6 months ago
- [WMT 2022] Implementation of TAL-SJTU's system for WMT22 English-Livonian☆23May 4, 2023Updated 2 years ago
- The source code of Paper "PathQG: Neural Question Generation from Facts".☆23Jan 4, 2021Updated 5 years ago
- reimplement of "GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition"☆16Nov 10, 2020Updated 5 years ago
- Extracting LaTeX equations from PDF☆21Sep 14, 2023Updated 2 years ago
- EDSL code☆19Mar 19, 2022Updated 3 years ago
- Khmer Character Specification☆25Mar 14, 2025Updated 11 months ago
- [EMNLP 2021] Dataset and PyTorch Code for ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning☆15Nov 5, 2022Updated 3 years ago
- Code repo for EMNLP 2023 paper "Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models"☆23Nov 13, 2023Updated 2 years ago
- ☆47Dec 16, 2022Updated 3 years ago
- ☆27Feb 20, 2024Updated 2 years ago
- Diffusion based transformer, in PyTorch (Experimental).☆24Sep 13, 2022Updated 3 years ago
- Code of "Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model"☆23Jun 28, 2024Updated last year
- Project website of TE141K.☆17Mar 24, 2020Updated 5 years ago
- Official repository of the paper: "A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition"☆26Jul 10, 2023Updated 2 years ago
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆28May 28, 2024Updated last year
- This repository is a concise collection of well known deep learning based document binarization models.☆27Dec 24, 2022Updated 3 years ago