metachris / pdfxLinks
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,062Updated 2 years ago
Alternatives and similar repositories for pdfx
Users that are interested in pdfx are comparing it to the libraries listed below
Sorting:
- MOVED TO https://gitlab.com/crossref/pdfextract☆509Updated 7 years ago
- Content ExtRactor and MINEr☆496Updated 3 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,244Updated 3 years ago
- Query Google Scholar with Python☆295Updated last year
- A fast and friendly PDF scraping library.☆778Updated last year
- Academic writing with Markdown☆354Updated 4 years ago
- A PDF comparison utility in Python.☆482Updated 7 months ago
- Python script to do PDF OCR conversion using Tesseract☆375Updated 2 years ago
- Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/☆766Updated 9 months ago
- A tool to create animated graph visualizations, based on graphviz.☆498Updated last year
- Scripts for Latex to HTML5 conversion☆718Updated last year
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,274Updated 4 years ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,896Updated last year
- Fork of Pandoc for the implementation of a ScholarlyMarkdown parser☆334Updated 10 years ago
- Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf fi…☆1,592Updated last year
- A machine learning software for extracting information from scholarly documents☆4,182Updated this week
- extract text from any document. no muss. no fuss.☆4,184Updated 7 months ago
- Convert LaTeX documents into beautiful responsive web pages using LaTeXML.☆1,098Updated last year
- Extract data from websites using basic statistical magic☆505Updated 4 years ago
- Automatic Web Article Summarizer☆417Updated 3 years ago
- Python command-line script for converting .csv data to LaTeX tables☆219Updated 6 years ago
- Camelot: PDF Table Extraction for Humans☆3,694Updated 2 years ago
- Given a scholarly PDF, extract figures, tables, captions, and section titles.☆676Updated last year
- The simplest way to extract text from PDFs in Python☆428Updated 3 years ago
- A framework for creating semi-automatic web content extractors☆502Updated 3 weeks ago
- Extract bibliographic references from (High-Energy Physics) articles.☆137Updated 3 weeks ago
- A parser for Google Scholar, written in Python☆2,154Updated 2 years ago
- A post-processing tool for scanned sheets of paper.☆1,088Updated last year
- Pyzotero: a Python client for the Zotero API☆1,055Updated 2 weeks ago
- Create, edit and display a journal article, entirely in GitHub☆619Updated 2 years ago