danigm / poppler
Personal clone of Poppler, official repository is here: https://gitlab.freedesktop.org/poppler/poppler
☆130Updated 6 years ago
Alternatives and similar repositories for poppler:
Users that are interested in poppler are comparing it to the libraries listed below
- This is not the poppler repository. Please see https://poppler.freedesktop.org/☆53Updated 15 years ago
- Extremely Naive Charset Analyser☆285Updated 5 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆214Updated 5 years ago
- Command line tool to extract figures, tables, and captions from scholarly documents in PDF form.☆130Updated 6 years ago
- Document convertor using liblibreoffice - new canonical home (please update links and bookmarks):☆25Updated 6 years ago
- [DEPRECATED - please use rups instead] RUPS is an abbreviation for Reading and Updating PDF Syntax. RUPS is a tool built on top of iText®…☆110Updated 6 years ago
- Linguistic Annotation and Visualization Tool for PDF Documents☆200Updated 5 years ago
- Lexical database of any language☆178Updated 2 years ago
- k2pdfopt library for koreader, based on http://willus.com/k2pdfopt☆99Updated 3 months ago
- A Lingoes dictionary file (LD2/LDX) reader/extractor. Written in C++ with Qt☆77Updated 10 years ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- PoDoFo is a library to work with the PDF file format. The name comes from the first letter of PDF (Portable Document Format). A few tools…☆52Updated 10 years ago
- Samples for using PDF Writer. using them in my blog☆35Updated 13 years ago
- A python tool to reduce pdf2htmlEX output file size.☆10Updated 10 years ago
- ImageMagick Legacy is a powerful, open-source software suite for creating, editing, converting, and manipulating images in over 200 forma…☆211Updated this week
- Python API for Various DB-Backed Simhash Clusters☆64Updated 8 years ago
- mirror of https://gitlab.mister-muffin.de/josch/img2pdf for Travis and appveyor CI☆535Updated this week
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 7 years ago
- Phantompy is a headless WebKit engine with powerful pythonic api build on top of Qt5 Webkit☆612Updated 7 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 9 years ago
- An extendable docx file format parser and converter☆191Updated 4 years ago
- Small library containing various image processing algorithms (+ Python 3 bindings) that has almost no dependencies -- Moved to Gnome's Gi…☆62Updated 6 years ago
- Extract meaningful content from pdf and psd file, such as texts and images both linked into a common JSON string☆37Updated 7 years ago
- Python script to do PDF OCR conversion using Tesseract☆374Updated last year
- OpenCC binding for Python.☆52Updated 4 years ago
- A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/…☆131Updated last year
- python module reading the StarDict dictionaries☆45Updated last year
- Python bindings for CHMLIB☆55Updated last year
- cli for extracting text from PDF files (and maybe possibly tables)☆78Updated last week
- ☆152Updated 8 years ago