wzlxjtu / PDF2LaTeX-datasetLinks
☆21Updated 5 years ago
Alternatives and similar repositories for PDF2LaTeX-dataset
Users that are interested in PDF2LaTeX-dataset are comparing it to the libraries listed below
Sorting:
- Scanning Single Shot Detector for Math in Document Images☆131Updated 2 years ago
- Solution to im2latex request for research of openai☆90Updated last year
- A GPT-based generative LM for combined text and math formulas, leveraging tree-based formula encoding.☆40Updated 2 years ago
- A command line interface to download PDF files from https://arxiv.org.☆52Updated last year
- Question Answering dataset generator of Document Visual in English and Chinese☆24Updated 2 years ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆154Updated 9 months ago
- Flask app for article abstract and listing pages☆152Updated this week
- An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.☆17Updated 3 years ago
- Image to LaTeX pytorch model☆14Updated 2 years ago
- Another LaTex equation OCR tool based on ConvNeXt and Transformer☆50Updated last year
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆106Updated 10 months ago
- an unofficial code for augment-XY-CUT in XYLayoutLM☆28Updated 3 years ago
- JAX implementations of RWKV☆19Updated last year
- arXiv Search UI & APIs☆118Updated last month
- TDF-ICDAR 2019 Dataset for Typeset Math Formula Detection☆68Updated 5 years ago
- Training a reward model for RLHF using RWKV.☆14Updated 2 years ago
- ☆9Updated 5 years ago
- Implementation of the SOTA Transformer architecture from PaLM - Scaling Language Modeling with Pathways in JAX/Flax☆13Updated 3 years ago
- The Soft Cosine Measure system developed for the ARQMath-3 shared task evaluation of math information retrieval systems☆13Updated 2 years ago
- GHOSTS dataset☆38Updated 2 years ago
- A working Docker image for the Maxtract program that converts pdf to LaTeX sources☆14Updated 5 years ago
- LaTeX OCR 的数据仓库☆126Updated last year
- MozoLM: A language model (LM) serving library☆45Updated last week
- Python tools for processing the stackexchange data dumps into a text dataset for Language Models☆83Updated last year
- Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binari…☆12Updated 7 years ago
- Python and JS tools to generate Printed LaTex formulas and images☆16Updated last year
- The multilingual variant of GLM, a general language model trained with autoregressive blank infilling objective☆62Updated 2 years ago
- Python tools for creating suitable dataset for OpenAI's im2latex task: https://openai.com/requests-for-research/#im2latex☆139Updated 6 years ago
- Apache PDFBox extension for precisely extracting character/symbol locations and identities from born-digital PDF files.☆19Updated 3 years ago
- Fast stand-alone C++ decoder for RNN-based NMT models☆26Updated 4 years ago