sandinmyjoints / fold_to_ascii
A Python port of the Apache Lucene ASCII Folding Filter that converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the ‘Basic Latin’ Unicode block) into ASCII equivalents, if they exist.
☆15Updated 4 years ago
Alternatives and similar repositories for fold_to_ascii:
Users that are interested in fold_to_ascii are comparing it to the libraries listed below
- A simple fuzzy matching set for python strings☆226Updated 8 months ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆151Updated 3 months ago
- Python bindings to the Compact Language Detector☆33Updated 4 years ago
- Hy-phen-ation made easy☆212Updated 2 months ago
- python library for extracting html microdata☆166Updated last year
- Python Solr query utility // http://solrq.readthedocs.org/en/latest/☆25Updated 2 years ago
- Abydos NLP/IR library for Python☆185Updated 2 years ago
- RDFLib store using SQLAlchemy dbapi as back-end☆153Updated last year
- Language detection extension for spaCy 2.0+☆112Updated 6 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆169Updated 3 years ago
- A Cython implementation of the affine gap string distance☆57Updated 2 years ago
- An asynchronous SPARQL client library using aiohttp☆25Updated 7 months ago
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 7 months ago
- Python package for Google's diff-match-patch native C++ implementation.☆76Updated 10 months ago
- Utility library to turn country names into ISO two-letter codes☆66Updated 2 months ago
- A trend viewer written in Python/JavaScript☆21Updated 5 months ago
- Hunspell extension for spaCy 2.0.☆94Updated 8 months ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆62Updated 8 years ago
- The most basic Text::Unidecode port (licensed under Artistic License or GPL or GPLv2+ - choose whatever you want)☆66Updated 2 years ago
- Modularly extensible semantic metadata validator☆84Updated 9 years ago
- A time machine for debugging pesky stateful errors.☆35Updated 8 years ago
- Regular Expression based parsers for extracting data from natural languages☆70Updated 7 years ago
- URL normalization for Python☆94Updated this week
- Parser and standardizer for politician, individual and organization names.☆130Updated 7 years ago
- Manage and load dataprotocols.org Data Packages☆27Updated 9 years ago
- Sunburnt offspring solr client☆27Updated 3 years ago
- Extract text from HTML☆135Updated 4 years ago
- Search 'from' and 'to' strings to learn a text cleaning mapping☆17Updated 9 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Extract, parse and populate templates from strings☆27Updated 6 years ago