wzlxjtu / PDF2LaTeX-datasetLinks
☆21Updated 4 years ago
Alternatives and similar repositories for PDF2LaTeX-dataset
Users that are interested in PDF2LaTeX-dataset are comparing it to the libraries listed below
Sorting:
- Scanning Single Shot Detector for Math in Document Images☆130Updated 2 years ago
- A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents☆25Updated 2 years ago
- ☆43Updated 2 years ago
- Logical structure analysis for visually structured documents☆90Updated 2 years ago
- ☆9Updated 5 years ago
- A GPT-based generative LM for combined text and math formulas, leveraging tree-based formula encoding.☆40Updated last year
- Implementation of the SOTA Transformer architecture from PaLM - Scaling Language Modeling with Pathways in JAX/Flax☆13Updated 3 years ago
- Solution to im2latex request for research of openai☆90Updated last year
- ☆17Updated last year
- transformer based OCR framework used to train OCR or image to latex☆9Updated 2 years ago
- DocBankLoader is a dataset loader for DocBank, and can convert DocBank to the Object Detection models' format.☆24Updated 4 years ago
- Handwritten mathematical symbols recognition with TrOCR☆18Updated last year
- Question Answering dataset generator of Document Visual in English and Chinese☆24Updated 2 years ago
- Training a reward model for RLHF using RWKV.☆14Updated 2 years ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆351Updated 2 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆106Updated 10 months ago
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆73Updated 2 months ago
- TDF-ICDAR 2019 Dataset for Typeset Math Formula Detection☆68Updated 5 years ago
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆124Updated 2 years ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆153Updated 9 months ago
- Fully automated end-to-end framework to extract data from bar plots and other figures in scientific research papers using modules such as…☆113Updated 3 years ago
- Object Detection Model for Scanned Documents☆93Updated 3 months ago
- Official implementation for ICDAR 2021 best poster paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Tr…☆124Updated last year
- A collection of OCR-related datasets☆173Updated 2 years ago
- Neural MMO - A Massively Multiagent Environment for Artificial Intelligence Research☆15Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆19Updated 2 years ago
- Two approaches for robust TableQA: 1) ITR is a general-purpose retrieval-based approach for handling long tables in TableQA transformer m…☆39Updated last year
- ☆22Updated last year
- Converts from AsciiMath, LaTeX, MathML to LaTeX, MathML☆57Updated 5 years ago