mpacula / AutoCorpusLinks
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 13 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below
Sorting:
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 7 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- Generalized Language Modeling toolkit☆51Updated 3 years ago
- Speech Processing & Linguistic Analysis Tool☆11Updated 6 years ago
- NLTK Contrib☆166Updated last year
- Visualization for hidden Markov model computations☆14Updated 10 years ago
- Uses a distributed word representation to finds words along the hyperchord of two input words.☆102Updated 5 years ago
- The Community-enRiched Open WordNet (CROWN)☆18Updated 9 years ago
- a port of the Wavenet algorithm to generate poems (using Samuel Graván's @Zeta36 code).☆36Updated 8 years ago
- A specializer for Gaussian Mixture Models, based on the ASP framework☆43Updated 13 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 7 years ago
- The Kyoyo Language Modeling Toolkit☆27Updated 10 years ago
- A Combinatory Categorial Grammar library.☆22Updated 11 years ago
- pronunciation LEXicons for Any Low-resource Language☆21Updated 5 years ago
- A simple toolkit for speaker segmentation and identification☆30Updated 12 years ago
- Excitement Open Platform for Recognizing Textual Entailments☆88Updated 7 years ago
- A visualizer for multi-dimensional semantic data☆38Updated 13 years ago
- NLP tools developed by Emory University.☆61Updated 9 years ago
- ☆55Updated 7 years ago
- Raplysaattori is a software used to detect rhymes and compute their lengths from English / Finnish rap lyrics.☆70Updated 7 years ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆14Updated 7 years ago
- Fast Word Clustering Software☆78Updated 6 months ago
- Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic pr…☆69Updated 2 months ago
- Recurrent neural networks with theano.☆28Updated 15 years ago
- ThoughtTreasure commonsense knowledge base and architecture for natural language processing☆79Updated 10 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- English Dependency Relationship Extractor☆86Updated 8 months ago
- Zurich Morphological Lexicon for German: a tool to extract a morphological lexicon from Wiktionary☆11Updated 2 years ago
- Building and Using A Seed Corpus for the Human Language Project☆11Updated 7 years ago
- TiMBL implements several memory-based learning algorithms.☆53Updated 2 months ago