metachris / pdfxLinks
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,056Updated last year
Alternatives and similar repositories for pdfx
Users that are interested in pdfx are comparing it to the libraries listed below
Sorting:
- MOVED TO https://gitlab.com/crossref/pdfextract☆509Updated 7 years ago
- Scripts for Latex to HTML5 conversion☆718Updated last year
- Content ExtRactor and MINEr☆494Updated 2 years ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,893Updated last year
- A fast and friendly PDF scraping library.☆777Updated last year
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,276Updated 4 years ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆294Updated 3 years ago
- Academic writing with Markdown☆354Updated 4 years ago
- extract text from any document. no muss. no fuss.☆4,145Updated 6 months ago
- The simplest way to extract text from PDFs in Python☆428Updated 2 years ago
- Convert LaTeX documents into beautiful responsive web pages using LaTeXML.☆1,097Updated last year
- A parser for Google Scholar, written in Python☆2,146Updated 2 years ago
- An open-source CRF Reference String Parsing Package☆158Updated 5 years ago
- A tool to create animated graph visualizations, based on graphviz.☆496Updated last year
- check for passive words, weasel words, duplicate words, typographical errors and words strunk & white don't like☆587Updated 6 years ago
- Bringing the python data stack to the shell prompt☆787Updated 4 years ago
- ☆880Updated last year
- ☆1,574Updated 3 years ago
- A Python data analysis library that is optimized for humans instead of machines.☆1,181Updated 3 months ago
- Fork of Pandoc for the implementation of a ScholarlyMarkdown parser☆334Updated 9 years ago
- A visual editor for research.☆1,007Updated 5 years ago
- A Python stream processing engine modeled after Yahoo! Pipes☆1,601Updated 3 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,242Updated 2 years ago
- LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.☆1,083Updated last week
- Easy color scales and color conversion for Python.☆261Updated 3 months ago
- A framework for creating semi-automatic web content extractors☆501Updated 7 months ago
- Renders papers from arXiv as responsive web pages so you don't have to squint at a PDF.☆1,626Updated 2 years ago
- Code for the paper Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge (EMNLP 2016). http://arxi…☆430Updated 8 years ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆659Updated last year
- Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/☆761Updated 8 months ago