dsidavis / pdftohtml
copy of pdftohtml code with enhancements
☆25Updated last year
Alternatives and similar repositories for pdftohtml:
Users that are interested in pdftohtml are comparing it to the libraries listed below
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- PolyTeX to LaTeX and HTML☆48Updated last month
- A visualization tool to support reviewing the scientific literature☆14Updated 6 years ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- Hierarchical phrase-based machine translation system☆32Updated 10 years ago
- Artificial Intelligence Knowledge Information Framework☆55Updated last year
- Experimental extraction of DOI citation information from Reddit submission dump.☆8Updated 9 years ago
- Citation Style Language utilities☆18Updated 3 years ago
- Plugin to use rich text in Annotator☆30Updated 10 years ago
- Offline storage for the Annotator☆43Updated 7 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- Python library for creating, reading, and modifying Docx files for Microsoft Word☆15Updated 10 years ago
- High-level build project for all LAPDF-Text submodules☆103Updated 9 years ago
- This is the core libferris repository. It is the primary tree for development as at 2015.☆22Updated 6 months ago
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)☆31Updated last year
- Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+☆48Updated 2 years ago
- PDF Extraction Toolkit☆41Updated 4 years ago
- Authorea's TeX-based stylist.--Automatically style research documents☆26Updated 8 years ago
- Zorba - the NoSQL processor☆42Updated last year
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- A Scrivener 2.2 document that uses MMD 3 and BibDesk to compile into an academic thesis PDF via LaTeX.☆59Updated 13 years ago
- Qute Text Editor - built with web technologies☆130Updated 11 years ago
- stav text annotation visualiser☆34Updated 13 years ago
- slide show (s9) docs☆12Updated 7 years ago
- Command-line HTML to XHTML converter☆44Updated 6 months ago
- Structured Data from PDF image-based files☆88Updated 12 years ago
- Stanford Tregex-inspired language for rule-based dependency tree manipulation.☆21Updated 8 years ago
- Basic CSS Template for a latex article converted into HTML.☆43Updated 11 years ago
- Node.js port of EtherCalc with widgets☆10Updated 7 years ago
- Manifests of the public domain images uploaded to Flickr Commons, with descriptive information about the books they were taken from.☆75Updated 11 years ago