grantjenks / python-wordsegmentLinks

English word segmentation, written in pure-Python, and based on a trillion-word corpus.

☆376

Alternatives and similar repositories for python-wordsegment

Users that are interested in python-wordsegment are comparing it to the libraries listed below

Sorting:

pyenchant / pyenchant
spellchecking library for python
☆610Updated last year
scoder / acora
Fast multi-keyword search engine for text strings
☆256Updated 10 months ago
aboSamoor / pycld2
☆171Updated 4 months ago
snowballstem / pystemmer
Python stemming library using snowball stemmers
☆263Updated 2 months ago
fnl / segtok
Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…
☆170Updated 3 years ago
gutfeeling / word_forms
Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.
☆635Updated 4 years ago
fabianvf / python-rake
☆129Updated 3 years ago
seomoz / simhash-py
Simhash and near-duplicate detection
☆416Updated 2 years ago
gpoulter / python-ngram
Python Set subclass that supports searching by ngram similarity
☆119Updated 3 years ago
proycon / pynlpl
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, an…
☆476Updated last year
textpipe / textpipe
Textpipe: clean and extract metadata from text
☆302Updated 4 years ago
miso-belica / jusText
Heuristic based boilerplate removal tool
☆786Updated 5 months ago
mmautner / readability
a collection of functions that measure the readability of a given body of text
☆195Updated 7 years ago
nreimers / truecaser
Language independent truecaser in Python.
☆160Updated 3 years ago
pytries / datrie
Fast, efficiently stored Trie for Python. Uses libdatrie.
☆537Updated last year
myint / language-check
Python wrapper for LanguageTool grammar checker
☆329Updated 3 years ago
pyhunspell / pyhunspell
(Official repo for pypi package) Python bindings for the Hunspell spellchecker engine
☆186Updated 4 years ago
lanl / pyxDamerauLevenshtein
pyxDamerauLevenshtein implements the Damerau-Levenshtein (DL) edit distance algorithm for Python in Cython for high performance.
☆246Updated this week
bsolomon1124 / pycld3
Python3 bindings for the Compact Language Detector v3 (CLD3)
☆154Updated 2 years ago
phatpiglet / autocorrect
Python 3 Spelling Corrector
☆177Updated last year
slanglab / phrasemachine
Quickly extract multi-word phrases from a corpus
☆192Updated 5 years ago
axiak / fuzzyset
A simple fuzzy matching set for python strings
☆228Updated 11 months ago
TimKam / compound-word-splitter
A compound word splitter for Python
☆48Updated 3 years ago
misja / python-boilerpipe
Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
☆543Updated 4 years ago
fnl / syntok
Text tokenization and sentence segmentation (segtok v2)
☆205Updated 3 years ago
google / pygtrie
Python library implementing a trie data structure.
☆824Updated 4 years ago
bjascob / LemmInflect
A python module for English lemmatization and inflection.
☆268Updated last year
dat / pyner
Python interface to the Stanford Named Entity Recognizer
☆292Updated 3 years ago
pytries / marisa-trie
Static memory-efficient Trie-like structures for Python based on marisa-trie C++ library.
☆1,098Updated last month
mpuig / spacy-lookup
Named Entity Recognition based on dictionaries
☆242Updated 6 years ago