wzlxjtu / PDF2LaTeX-dataset
☆21Updated 4 years ago
Alternatives and similar repositories for PDF2LaTeX-dataset:
Users that are interested in PDF2LaTeX-dataset are comparing it to the libraries listed below
- Scanning Single Shot Detector for Math in Document Images☆130Updated 2 years ago
- Another LaTex equation OCR tool based on ConvNeXt and Transformer☆49Updated last year
- Solution to im2latex request for research of openai☆89Updated last year
- Question Answering dataset generator of Document Visual in English and Chinese☆24Updated 2 years ago
- TDF-ICDAR 2019 Dataset for Typeset Math Formula Detection☆68Updated 5 years ago
- an unofficial code for augment-XY-CUT in XYLayoutLM☆27Updated 2 years ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆146Updated 7 months ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆105Updated 8 months ago
- This is a tensorflow-based version of JianzhuZhang's Watch Attend and Parse model☆19Updated 6 years ago
- A GPT-based generative LM for combined text and math formulas, leveraging tree-based formula encoding.☆35Updated last year
- Image to LaTeX pytorch model☆14Updated last year
- DocBankLoader is a dataset loader for DocBank, and can convert DocBank to the Object Detection models' format.☆23Updated 4 years ago
- Training a reward model for RLHF using RWKV.☆14Updated last year
- 1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection(公式检测冠军方案)☆130Updated last year
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆81Updated last year
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Updated 3 years ago
- Code for ICPR2022 paper: "Graph Neural Networks and Representation Embedding for table extraction in PDF Documents"☆35Updated last year
- ☆9Updated 5 years ago
- Pytorch implementation of math equation images to latex markup language.☆30Updated 4 years ago
- multimodal document analysis☆164Updated 10 months ago
- Math formula recognition (Images to LaTeX strings)☆300Updated last year
- The multilingual variant of GLM, a general language model trained with autoregressive blank infilling objective☆62Updated 2 years ago
- Python tools for creating suitable dataset for OpenAI's im2latex task: https://openai.com/requests-for-research/#im2latex☆137Updated 6 years ago
- A neural network capable of translating handwriting into text along with complex tools to generate datasets☆20Updated 5 years ago
- 中文手写汉字识别☆7Updated 6 years ago
- Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集☆34Updated 2 years ago
- ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...☆179Updated 3 years ago
- Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex☆193Updated last year
- Python and JS tools to generate Printed LaTex formulas and images☆16Updated last year
- GTDB dataset for training & evaluation for mathematical OCR systems☆27Updated 4 years ago