mpacula / AutoCorpus
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 13 years ago
Alternatives and similar repositories for AutoCorpus:
Users that are interested in AutoCorpus are comparing it to the libraries listed below
- Generalized Language Modeling toolkit☆51Updated 2 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- A visualizer for multi-dimensional semantic data☆38Updated 13 years ago
- http://www.ark.cs.cmu.edu/ARKref/☆32Updated 10 years ago
- Speech modeling using code by Kratarth Goel http://dblp.uni-trier.de/pers/hd/g/Goel:Kratarth☆9Updated 10 years ago
- Visualization for hidden Markov model computations☆14Updated 10 years ago
- Basic dataset for the linguistic data collection.☆15Updated 8 years ago
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 6 years ago
- The Community-enRiched Open WordNet (CROWN)☆19Updated 9 years ago
- ThoughtTreasure commonsense knowledge base and architecture for natural language processing☆78Updated 9 years ago
- This repository contains tool and collections dataset for detecting off-topic pages from Web archived collections.☆18Updated 9 years ago
- Recurrent Neural Network language modeling toolkit☆38Updated 11 years ago
- Parsito: Fast non-projective transition-based dependency parser☆14Updated 2 years ago
- Speech Processing & Linguistic Analysis Tool☆10Updated 5 years ago
- A fork of the sofia ml machine learning program☆14Updated 13 years ago
- Random fun with statistical language models.☆65Updated 5 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- pronunciation LEXicons for Any Low-resource Language☆21Updated 4 years ago
- ☆62Updated 10 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Updated 9 years ago
- Fast Word Clustering Software☆78Updated 2 weeks ago
- Natural Logic Inference for Common Sense Reasoning☆61Updated 6 years ago
- Open-source tools for morphological tagging, segmentation and stemming.☆41Updated 5 years ago
- Code for morphological transformations☆29Updated 7 years ago
- Offline extractor of synchronous context-free grammars for machine translation.☆31Updated 9 years ago
- Python natural language processing work☆29Updated 15 years ago
- a port of the Wavenet algorithm to generate poems (using Samuel Graván's @Zeta36 code).☆36Updated 7 years ago
- Natural Language Question Answering Engine☆33Updated 10 years ago
- This is the text partitioner project for Python.☆21Updated 6 years ago
- A Combinatory Categorial Grammar library.☆22Updated 11 years ago