metachris / pdfxLinks
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,058Updated 2 years ago
Alternatives and similar repositories for pdfx
Users that are interested in pdfx are comparing it to the libraries listed below
Sorting:
- MOVED TO https://gitlab.com/crossref/pdfextract☆509Updated 7 years ago
- Content ExtRactor and MINEr☆497Updated 2 years ago
- extract text from any document. no muss. no fuss.☆4,173Updated 6 months ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,275Updated 4 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,243Updated 3 years ago
- Query Google Scholar with Python☆295Updated last year
- A PDF comparison utility in Python.☆477Updated 6 months ago
- The simplest way to extract text from PDFs in Python☆428Updated 2 years ago
- Python script to do PDF OCR conversion using Tesseract☆375Updated 2 years ago
- A tool to create animated graph visualizations, based on graphviz.☆498Updated last year
- A Python data analysis library that is optimized for humans instead of machines.☆1,184Updated 2 weeks ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,293Updated 2 years ago
- Python implementation of TextRank algorithms ("textgraphs") for phrase extraction☆2,178Updated last week
- A visual editor for research.☆1,007Updated 5 years ago
- Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/☆764Updated 8 months ago
- Automatic Web Article Summarizer☆417Updated 3 years ago
- Pyzotero: a Python client for the Zotero API☆1,041Updated 2 weeks ago
- Document processing for investigations☆251Updated 8 years ago
- A Python library for creating LaTeX files☆2,329Updated 10 months ago
- Academic writing with Markdown☆354Updated 4 years ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,895Updated last year
- A library for reading text files over multiple cores.☆1,055Updated last year
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆665Updated last year
- Given a scholarly PDF, extract figures, tables, captions, and section titles.☆671Updated last year
- Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf fi…☆1,592Updated last year
- A proof of concept using IBM's Speech-to-Text API to do quick-and-dirty transcriptions☆312Updated 8 years ago
- A high performance bibliographic information service: https://biblio-glutton.readthedocs.io☆139Updated this week
- Monitor the output of terminals and processes.☆1,013Updated 9 years ago
- Bibtex parser for Python 3☆530Updated 6 months ago
- A toolkit for making domain-specific probabilistic parsers☆803Updated 8 months ago