metachris / pdfxLinks
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,073Updated 2 years ago
Alternatives and similar repositories for pdfx
Users that are interested in pdfx are comparing it to the libraries listed below
Sorting:
- MOVED TO https://gitlab.com/crossref/pdfextract☆510Updated 8 years ago
- Scripts for Latex to HTML5 conversion☆717Updated 2 years ago
- Content ExtRactor and MINEr☆509Updated 3 years ago
- A fast and friendly PDF scraping library.☆783Updated 2 years ago
- Python script to do PDF OCR conversion using Tesseract☆376Updated 2 years ago
- Academic writing with Markdown☆354Updated 4 years ago
- A tool to create animated graph visualizations, based on graphviz.☆505Updated 2 years ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,279Updated 5 years ago
- Query Google Scholar with Python☆294Updated 2 weeks ago
- A PDF comparison utility in Python.☆502Updated last year
- Extract data from websites using basic statistical magic☆505Updated 5 years ago
- Create, edit and display a journal article, entirely in GitHub☆619Updated 3 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,251Updated 3 years ago
- Automatic Web Article Summarizer☆415Updated 4 years ago
- Bibtex parser for Python 3☆557Updated 11 months ago
- Import tables from any Wikipedia article as a dataset in Python☆294Updated 4 years ago
- Clean Thesis is a clean, simple, and elegant LaTeX style (or template) for thesis documents.☆922Updated last year
- Bringing the python data stack to the shell prompt☆787Updated 4 years ago
- Extract bibliographic references from (High-Energy Physics) articles.☆138Updated 2 weeks ago
- A Python data analysis library that is optimized for humans instead of machines.☆1,195Updated 2 weeks ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,912Updated last year
- Python command-line script for converting .csv data to LaTeX tables☆222Updated 6 years ago
- Convert LaTeX documents into beautiful responsive web pages using LaTeXML.☆1,099Updated last year
- Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/☆783Updated last year
- Document processing for investigations☆250Updated 8 years ago
- The simplest way to extract text from PDFs in Python☆428Updated 3 years ago
- Fork of Pandoc for the implementation of a ScholarlyMarkdown parser☆334Updated 10 years ago
- A Python library for creating LaTeX files☆2,350Updated last year
- Bibcure helps in boring tasks by keeping your bibfile up to date and normalized...also allows you to easily download all papers inside yo…☆206Updated 2 years ago
- A framework for creating semi-automatic web content extractors☆503Updated 5 months ago