JDonner / NoAhoLinks

Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3

☆51

Alternatives and similar repositories for NoAho

Users that are interested in NoAho are comparing it to the libraries listed below

Sorting:

nlehuen / pytst
C++ Ternary Search Tree implementation with Python bindings
☆43Updated 7 years ago
scoder / acora
Fast multi-keyword search engine for text strings
☆256Updated 10 months ago
pytries / hat-trie
HAT-Trie for Python
☆86Updated 9 years ago
willf / segment
A tool to segment text based on frequencies and the Viterbi algorithm "#TheBoyWhoLived" => ['#', 'The', 'Boy', 'Who', 'Lived']
☆81Updated 9 years ago
piskvorky / gensim-simserver
[NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]
☆108Updated 12 years ago
scrapinghub / python-simhash
An efficient simhash implementation for python
☆125Updated 5 years ago
adsva / python-wapiti
Python bindings for libwapiti
☆67Updated 5 years ago
jimmycallin / pydsm
A Python framework for exploring distributional semantic models.
☆85Updated 9 years ago
wharris / esmre
Python extension module for accelerating regular expressions using libesm
☆132Updated last year
proycon / python-ucto
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet…
☆29Updated 7 months ago
andychisholm / sift
Knowledge extraction from web data
☆92Updated 7 years ago
ai-ku / wvec
Word vectors
☆64Updated 7 years ago
mattandahalfew / Levenshtein_search
Python search module for fast approximate string matching
☆54Updated 2 years ago
explosion / preshed
💥 Cython hash tables that assume keys are pre-hashed
☆86Updated 2 months ago
JonathanRaiman / wikipedia_ner
Labeled examples from wiki dumps in Python
☆67Updated 8 years ago
GregBowyer / cld2-cffi
Python bindings to the Compact Language Detector
☆33Updated 5 years ago
sloria / textblob-aptagger
*Deprecated* A fast and accurate part-of-speech tagger for TextBlob.
☆102Updated 9 years ago
semanticize / semanticizest
Standalone Semanticizer
☆32Updated 10 years ago
ahupp / bktree
Implementation of Burkhard-Keller trees in various languages
☆52Updated 15 years ago
fnl / segtok
Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…
☆170Updated 3 years ago
RaRe-Technologies / topic_eval
Tools and services for evaluating topic models
☆15Updated 9 years ago
jonsafari / clustercat
Fast Word Clustering Software
☆78Updated 5 months ago
interrogator / corpkit
A toolkit for corpus linguistics
☆205Updated 6 years ago
cslu-nlp / DetectorMorse
Fast supervised sentence boundary detection using the averaged perceptron
☆90Updated 6 years ago
piskvorky / sim-shootout
Code for "Performance shootout between nearest-neighbour libraries": http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neig…
☆99Updated 10 years ago
gr33ndata / dmoz-urlclassifier
Preparing DMOZ dataset for my n-Gram LM-based URL classification research
☆32Updated 10 years ago
gouwsmeister / TextCleanser
Normalizes lexically ill-formed text to its most likely clean text, e.g. "c u thr 2nite!" -> "see you there tonight!".
☆63Updated 9 years ago
tomazk / Text-Extraction-Evaluation
Framework for evaluating text extraction algorithms implemented as web services
☆42Updated 13 years ago
seomoz / qdr
Query-Document Relevance
☆42Updated 10 years ago
saadtazi / word2vec-query-expansion
An Apache Lucene TokenFilter that uses a word2vec vectors for term expansion.
☆24Updated 11 years ago