jerry2yu / ngrams

A package in C++ for character or word ngram analysis. It uses Ternary Search Tree instead of hashing table for faster ngram frequency counting. Words are converted to unique IDs and encoded to more compact base 256 integers. It is a partial implementation of Dr. Vlado Keselj 's Text-Ngrams 1.6, which is a very flexible Ngram package in perl.
20Updated 9 years ago

Alternatives and similar repositories for ngrams:

Users that are interested in ngrams are comparing it to the libraries listed below