felipeochoa / minecart
Simple, Pythonic extraction of text, shapes and images from PDFs
☆78Updated 4 years ago
Related projects: ⓘ
- Python interface to Apache PDFBox command-line tools.☆75Updated last year
- Get list of common stop words in various languages in Python☆155Updated 6 months ago
- Language detection extension for spaCy 2.0+☆111Updated 5 years ago
- A fully customisable language detection pipeline for spaCy☆93Updated 5 years ago
- Hunspell extension for spaCy 2.0.☆94Updated last month
- ☆10Updated this week
- The most basic Text::Unidecode port (licensed under Artistic License or GPL or GPLv2+ - choose whatever you want)☆64Updated last year
- Library for unit extraction - fork of quantulum for python3☆134Updated 2 months ago
- (Official repo for pypi package) Python bindings for the Hunspell spellchecker engine☆184Updated 3 years ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆215Updated 4 years ago
- Guess gender from first name in Python 2 and 3☆129Updated 2 years ago
- Extracting tabular information from PDFs using python☆42Updated 5 years ago
- A compound word splitter for Python☆48Updated 3 years ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆65Updated last year
- THIS REPOSITORY IS FORK☆30Updated last year
- Python address detector and parser☆199Updated 9 months ago
- ☆46Updated this week
- Python binding to Poppler-cpp pdf library☆95Updated 2 weeks ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆62Updated 7 years ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆144Updated 8 months ago
- Excel Integration with spaCy. Training NER using Excel/XLSX from PDF, DOCX, PPT, PNG or JPG.☆105Updated last year
- ☆159Updated 3 months ago
- ☆68Updated 5 months ago
- Extract dates from text☆64Updated 3 years ago
- Extra stopword lists for use with NLTK.☆28Updated 11 months ago
- The simplest way to extract text from PDFs in Python☆426Updated 2 years ago
- Soundex Phonetic Code Algorithm Demo for Indian Languages. Supports all indian languages and English. Provides intra-indic string compari…☆54Updated 5 years ago
- Parse natural language time expressions in python☆131Updated last year
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- Python API for PDF documents☆113Updated 2 weeks ago