metachris / pdfxLinks
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,068Updated 2 years ago
Alternatives and similar repositories for pdfx
Users that are interested in pdfx are comparing it to the libraries listed below
Sorting:
- MOVED TO https://gitlab.com/crossref/pdfextract☆510Updated 8 years ago
- Content ExtRactor and MINEr☆502Updated 3 years ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,276Updated 4 years ago
- Scripts for Latex to HTML5 conversion☆718Updated 2 years ago
- extract text from any document. no muss. no fuss.☆4,302Updated 9 months ago
- A fast and friendly PDF scraping library.☆782Updated last year
- Academic writing with Markdown☆352Updated 4 years ago
- Query Google Scholar with Python☆295Updated 2 weeks ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆293Updated 3 years ago
- The simplest way to extract text from PDFs in Python☆428Updated 3 years ago
- Extract data from websites using basic statistical magic☆505Updated 4 years ago
- A framework for creating semi-automatic web content extractors☆502Updated 3 months ago
- Python script to do PDF OCR conversion using Tesseract☆376Updated 2 years ago
- Automatic Web Article Summarizer☆417Updated 4 years ago
- Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf fi…☆1,596Updated last year
- A Python data analysis library that is optimized for humans instead of machines.☆1,193Updated 3 weeks ago
- Import tables from any Wikipedia article as a dataset in Python☆292Updated 3 years ago
- High-level build project for all LAPDF-Text submodules☆103Updated 10 years ago
- Programmatic generation of high-quality CVs☆1,136Updated last year
- A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a S…☆865Updated 2 weeks ago
- Convert LaTeX documents into beautiful responsive web pages using LaTeXML.☆1,098Updated last year
- Easy color scales and color conversion for Python.☆263Updated 7 months ago
- [Not Maintained anymore] Python Module to get Meanings, Synonyms and what not for a given word☆561Updated 5 years ago
- Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab☆2,433Updated 7 years ago
- Camelot: PDF Table Extraction for Humans☆3,701Updated 2 years ago
- Bringing the python data stack to the shell prompt☆787Updated 4 years ago
- A PDF comparison utility in Python.☆488Updated 9 months ago
- Python command-line script for converting .csv data to LaTeX tables☆221Updated 6 years ago
- Creates audio supercuts.☆957Updated last year
- Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/☆773Updated last year