mpacula / AutoCorpusLinks
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 13 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below
Sorting:
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 7 years ago
 - Speech Processing & Linguistic Analysis Tool☆11Updated 6 years ago
 - Generalized Language Modeling toolkit☆51Updated 3 years ago
 - finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 3 years ago
 - NLTK Contrib☆166Updated last year
 - Uses a distributed word representation to finds words along the hyperchord of two input words.☆102Updated 5 years ago
 - Random fun with statistical language models.☆63Updated 6 years ago
 - Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump☆255Updated last year
 - NIST Language i-vector Machine Learning Challenge☆27Updated 9 years ago
 - Visualization for hidden Markov model computations☆14Updated 10 years ago
 - Fast Word Clustering Software☆78Updated 8 months ago
 - Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
 - bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
 - ☆20Updated 8 years ago
 - Barista is an open-source framework for concurrent speech processing.☆36Updated 11 years ago
 - A simple toolkit for speaker segmentation and identification☆30Updated 12 years ago
 - A hack to replace Pride & Prejudice text with closest word2vec model word, and visualize results.☆61Updated 10 years ago
 - ThoughtTreasure commonsense knowledge base and architecture for natural language processing☆79Updated 10 years ago
 - A visualizer for multi-dimensional semantic data☆38Updated 14 years ago
 - http://www.ark.cs.cmu.edu/ARKref/☆32Updated 11 years ago
 - Natural language Understanding Toolkit☆119Updated 11 years ago
 - Transition-based statistical parser☆417Updated 7 years ago
 - Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆14Updated 7 years ago
 - Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 7 years ago
 - *Deprecated* A fast and accurate part-of-speech tagger for TextBlob.☆101Updated 9 years ago
 - Zurich Morphological Lexicon for German: a tool to extract a morphological lexicon from Wiktionary☆11Updated 2 years ago
 - Topic Model Analyzer☆62Updated 10 years ago
 - Turbo topics find significant multiword phrases in topics.☆46Updated 10 years ago
 - rapid nlp prototyping☆71Updated 3 years ago
 - Updates to Zope's keyphrase extractor (forked from 1.1.0)☆67Updated 8 years ago