mpacula / AutoCorpus
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 12 years ago
Alternatives and similar repositories for AutoCorpus:
Users that are interested in AutoCorpus are comparing it to the libraries listed below
- Uses a distributed word representation to finds words along the hyperchord of two input words.☆101Updated 4 years ago
- Visualization for hidden Markov model computations☆14Updated 10 years ago
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 6 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- Generalized Language Modeling toolkit☆51Updated 2 years ago
- Read natural language interactive queries. Great for bots.☆18Updated 8 years ago
- Command-line corpus tools☆9Updated 7 years ago
- Speech modeling using code by Kratarth Goel http://dblp.uni-trier.de/pers/hd/g/Goel:Kratarth☆9Updated 9 years ago
- Basic dataset for the linguistic data collection.☆15Updated 7 years ago
- Random fun with statistical language models.☆65Updated 5 years ago
- The Community-enRiched Open WordNet (CROWN)☆19Updated 9 years ago
- Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.☆158Updated 2 years ago
- Updates to Zope's keyphrase extractor (forked from 1.1.0)☆67Updated 7 years ago
- a port of the Wavenet algorithm to generate poems (using Samuel Graván's @Zeta36 code).☆36Updated 7 years ago
- ThoughtTreasure commonsense knowledge base and architecture for natural language processing☆78Updated 9 years ago
- ☆62Updated 10 years ago
- http://www.ark.cs.cmu.edu/ARKref/☆32Updated 10 years ago
- Zurich Morphological Lexicon for German: a tool to extract a morphological lexicon from Wiktionary☆11Updated last year
- Jitar HMM part of speech tagger☆22Updated 9 years ago
- Standalone Semanticizer☆32Updated 9 years ago
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 7 years ago
- A fork of the sofia ml machine learning program☆14Updated 13 years ago
- A simple toolkit for speaker segmentation and identification☆30Updated 11 years ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆13Updated 6 years ago
- Speech act classifier for text based on Stanford CoreNLP and Weka☆34Updated 9 years ago
- code referenced in "Towards universal neural nets: Gibbs machines and ACE", Galin Georgiev, http://arxiv.org/abs/1508.06585☆14Updated 9 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆33Updated 9 years ago