metachris / pdfx
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,051Updated last year
Alternatives and similar repositories for pdfx:
Users that are interested in pdfx are comparing it to the libraries listed below
- MOVED TO https://gitlab.com/crossref/pdfextract☆508Updated 7 years ago
- Python script to do PDF OCR conversion using Tesseract☆374Updated last year
- A framework for creating semi-automatic web content extractors☆501Updated 4 months ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,232Updated 2 years ago
- A Python to Vega translator☆2,032Updated 8 years ago
- Bringing the python data stack to the shell prompt☆789Updated 4 years ago
- A parser for Google Scholar, written in Python☆2,130Updated 2 years ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆294Updated 2 years ago
- Scripts for Latex to HTML5 conversion☆720Updated last year
- A Python data analysis library that is optimized for humans instead of machines.☆1,179Updated last month
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,275Updated 4 years ago
- Query Google Scholar with Python☆294Updated last year
- Extract data from websites using basic statistical magic☆505Updated 4 years ago
- extract text from any document. no muss. no fuss.☆4,013Updated 3 months ago
- Instant access to many datasets in Python.☆941Updated 3 years ago
- Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf fi…☆1,584Updated last year
- A fast and friendly PDF scraping library.☆774Updated last year
- Automatic Web Article Summarizer☆415Updated 3 years ago
- ☆879Updated last year
- a plottling library for python, based on D3☆1,419Updated 4 years ago
- A visual editor for research.☆1,006Updated 5 years ago
- A tool to create animated graph visualizations, based on graphviz.☆492Updated last year
- A Python stream processing engine modeled after Yahoo! Pipes☆1,604Updated 3 years ago
- Bibcure helps in boring tasks by keeping your bibfile up to date and normalized...also allows you to easily download all papers inside yo…☆200Updated 2 years ago
- A toolkit for making domain-specific probabilistic parsers☆800Updated 6 months ago
- Webkit based scriptable web browser for python.☆2,762Updated last year
- Was an interactive continuous Python profiler.☆2,959Updated 4 years ago
- A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a S…☆860Updated 10 months ago
- Document processing for investigations☆250Updated 8 years ago
- A library for reading text files over multiple cores.☆1,055Updated last year