mpacula / AutoCorpus
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 13 years ago
Alternatives and similar repositories for AutoCorpus:
Users that are interested in AutoCorpus are comparing it to the libraries listed below
- Generalized Language Modeling toolkit☆51Updated 2 years ago
- Basic dataset for the linguistic data collection.☆15Updated 8 years ago
- A fork of the sofia ml machine learning program☆14Updated 13 years ago
- The Community-enRiched Open WordNet (CROWN)☆19Updated 9 years ago
- Theano implementation of the Neural GPU☆15Updated 9 years ago
- code referenced in "Towards universal neural nets: Gibbs machines and ACE", Galin Georgiev, http://arxiv.org/abs/1508.06585☆14Updated 9 years ago
- A visualizer for multi-dimensional semantic data☆38Updated 13 years ago
- A Cython wrapper to BLAS and LAPACK☆44Updated 11 years ago
- An interactive map of English words, where words with similar meaning appear closer together.☆22Updated 10 years ago
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 7 years ago
- Speech modeling using code by Kratarth Goel http://dblp.uni-trier.de/pers/hd/g/Goel:Kratarth☆9Updated 10 years ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)☆66Updated 7 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- Simple natural language parsing and semantic grounding☆10Updated 4 years ago
- Standalone Semanticizer☆32Updated 10 years ago
- Django framework for crowdsourcing complex tasks using MTurk☆64Updated 14 years ago
- Parsito: Fast non-projective transition-based dependency parser☆14Updated 2 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 10 years ago
- A specializer for Gaussian Mixture Models, based on the ASP framework☆43Updated 12 years ago
- The Kyoyo Language Modeling Toolkit☆27Updated 10 years ago
- Latent Dirichlet Allocation with Gibbs sampling☆16Updated 11 years ago
- natural language processing with link-grammar☆18Updated 15 years ago
- various simple RNNs trained on synthetic grammars☆30Updated 9 years ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆13Updated 6 years ago
- Barista is an open-source framework for concurrent speech processing.☆36Updated 11 years ago
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 7 years ago
- Textual Analysis of speeches using Google's Word2Vec Model☆31Updated 4 years ago
- Hierarchical phrase-based machine translation system☆32Updated 10 years ago
- Topic Model Analyzer☆62Updated 9 years ago
- A web application for exploring documents topically.☆26Updated 8 years ago