metachris / pdfx
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,052Updated last year
Alternatives and similar repositories for pdfx:
Users that are interested in pdfx are comparing it to the libraries listed below
- MOVED TO https://gitlab.com/crossref/pdfextract☆508Updated 7 years ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,276Updated 4 years ago
- Query Google Scholar with Python☆294Updated last year
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,237Updated 2 years ago
- extract text from any document. no muss. no fuss.☆4,072Updated 4 months ago
- Content ExtRactor and MINEr☆494Updated 2 years ago
- A parser for Google Scholar, written in Python☆2,131Updated 2 years ago
- Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/☆755Updated 6 months ago
- Scripts for Latex to HTML5 conversion☆717Updated last year
- A Python stream processing engine modeled after Yahoo! Pipes☆1,602Updated 3 years ago
- Text page dewarping using a "cubic sheet" model☆1,467Updated 2 years ago
- A Python library for creating LaTeX files☆2,313Updated 8 months ago
- Convert LaTeX documents into beautiful responsive web pages using LaTeXML.☆1,087Updated last year
- Python script to do PDF OCR conversion using Tesseract☆374Updated last year
- A visual editor for research.☆1,008Updated 5 years ago
- Academic writing with Markdown☆354Updated 3 years ago
- Fork of Pandoc for the implementation of a ScholarlyMarkdown parser☆334Updated 9 years ago
- Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)☆2,560Updated 6 years ago
- Given a scholarly PDF, extract figures, tables, captions, and section titles.☆651Updated last year
- Bibtex parser for Python 3☆518Updated 4 months ago
- A PDF comparison utility in Python.☆471Updated 4 months ago
- A Python data analysis library that is optimized for humans instead of machines.☆1,177Updated last month
- A Python to Vega translator☆2,032Updated 8 years ago
- A post-processing tool for scanned sheets of paper.☆1,070Updated 9 months ago
- Fast C based HTML 5 parsing for python☆687Updated 7 months ago
- Handwritten math expression parser☆684Updated 4 years ago
- A fast and friendly PDF scraping library.☆777Updated last year
- Create, edit and display a journal article, entirely in GitHub☆619Updated 2 years ago
- Interactive plotting for Python.☆438Updated 6 months ago
- Automatic Web Article Summarizer☆415Updated 3 years ago