sandinmyjoints / fold_to_ascii
A Python port of the Apache Lucene ASCII Folding Filter that converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the ‘Basic Latin’ Unicode block) into ASCII equivalents, if they exist.
☆15Updated 4 years ago
Alternatives and similar repositories for fold_to_ascii:
Users that are interested in fold_to_ascii are comparing it to the libraries listed below
- A trend viewer written in Python/JavaScript☆21Updated 4 months ago
- Sunburnt offspring solr client☆27Updated 3 years ago
- Python bindings to the Compact Language Detector☆33Updated 4 years ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆149Updated 2 months ago
- Python Solr query utility // http://solrq.readthedocs.org/en/latest/☆25Updated 2 years ago
- A Cython implementation of the affine gap string distance☆57Updated 2 years ago
- Python port for IWNLP.Lemmatizer☆17Updated last year
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 6 months ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆169Updated 3 years ago
- Python binding for gumbo-parser using Cython☆14Updated 8 years ago
- Extract, parse and populate templates from strings☆27Updated 6 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- Python package for Google's diff-match-patch native C++ implementation.☆74Updated 9 months ago
- Python package for harvesting records from OAI-PMH provider(s).☆62Updated 2 years ago
- Tool for tweaking dbpedia spotlight's models☆16Updated 7 years ago
- A time machine for debugging pesky stateful errors.☆35Updated 8 years ago
- Repository for creating models, vocabulary and other necessities for Dutch in Spacey☆11Updated 8 years ago
- Algorithms for "schema matching"☆26Updated 8 years ago
- Language detection extension for spaCy 2.0+☆112Updated 6 years ago
- Utility library to turn country names into ISO two-letter codes☆66Updated last month
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆62Updated 8 years ago
- ☆13Updated 7 years ago
- Hidden alignment conditional random field for classifying string pairs.☆36Updated 7 years ago
- GermaNER: Free Open German Named Entity Recognition Tool☆36Updated last year
- A fork of telescope, a SPARQL query building library for Python☆11Updated 7 years ago
- The most basic Text::Unidecode port (licensed under Artistic License or GPL or GPLv2+ - choose whatever you want)☆65Updated 2 years ago
- For extracting measurements and related entities from text☆57Updated 4 years ago
- Semanticizest: dump parser and client☆20Updated 8 years ago
- Index URLs in Common Crawl☆194Updated 7 years ago
- A text tagger based on Lucene / Solr, using FST technology☆176Updated last year