mpacula / AutoCorpusLinks
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 14 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below
Sorting:
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 7 years ago
- Speech Processing & Linguistic Analysis Tool☆11Updated 6 years ago
- Generalized Language Modeling toolkit☆51Updated 3 years ago
- NLTK Contrib☆169Updated last year
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 3 years ago
- Uses a distributed word representation to finds words along the hyperchord of two input words.☆102Updated 5 years ago
- A visualizer for multi-dimensional semantic data☆38Updated 14 years ago
- Vector Space Model Framework developed for InPhO☆39Updated 8 months ago
- a port of the Wavenet algorithm to generate poems (using Samuel Graván's @Zeta36 code).☆36Updated 8 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- Visualization for hidden Markov model computations☆14Updated 11 years ago
- The Community-enRiched Open WordNet (CROWN)☆18Updated 10 years ago
- Random fun with statistical language models.☆63Updated 6 years ago
- Standalone Semanticizer☆32Updated 10 years ago
- code referenced in "Towards universal neural nets: Gibbs machines and ACE", Galin Georgiev, http://arxiv.org/abs/1508.06585☆14Updated 10 years ago
- Turbo topics find significant multiword phrases in topics.☆46Updated 10 years ago
- A simple toolkit for speaker segmentation and identification☆31Updated 12 years ago
- Barista is an open-source framework for concurrent speech processing.☆36Updated 11 years ago
- ☆19Updated 10 years ago
- pronunciation LEXicons for Any Low-resource Language☆21Updated 5 years ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆14Updated 7 years ago
- NIST Language i-vector Machine Learning Challenge☆27Updated 9 years ago
- Topic Model Analyzer☆62Updated 10 years ago
- Fast Word Clustering Software☆79Updated 11 months ago
- Topic modeling web application☆40Updated 10 years ago
- DKPro WSD: A Java framework for word sense disambiguation☆20Updated 3 years ago
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆130Updated last year
- rapid nlp prototyping☆71Updated 3 years ago
- ☆38Updated 9 years ago
- A Python interface to OpenFst☆88Updated 6 years ago