mpacula / AutoCorpusLinks
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 13 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below
Sorting:
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- Generalized Language Modeling toolkit☆51Updated 3 years ago
- NLTK Contrib☆166Updated last year
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 7 years ago
- Uses a distributed word representation to finds words along the hyperchord of two input words.☆102Updated 5 years ago
- Speech Processing & Linguistic Analysis Tool☆11Updated 6 years ago
- Turbo topics find significant multiword phrases in topics.☆46Updated 10 years ago
- Phonetic and phonological vocoding platform☆16Updated 8 years ago
- A Combinatory Categorial Grammar library.☆22Updated 11 years ago
- NIST Language i-vector Machine Learning Challenge☆27Updated 9 years ago
- ☆55Updated 7 years ago
- Excitement Open Platform for Recognizing Textual Entailments☆88Updated 7 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 7 years ago
- Visualization for hidden Markov model computations☆14Updated 10 years ago
- Zurich Morphological Lexicon for German: a tool to extract a morphological lexicon from Wiktionary☆11Updated 2 years ago
- NLP tools developed by Emory University.☆61Updated 9 years ago
- The Kyoyo Language Modeling Toolkit☆27Updated 10 years ago
- Transition-based statistical parser☆417Updated 7 years ago
- The Community-enRiched Open WordNet (CROWN)☆18Updated 9 years ago
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆129Updated 9 months ago
- pronunciation LEXicons for Any Low-resource Language☆21Updated 5 years ago
- ThoughtTreasure commonsense knowledge base and architecture for natural language processing☆79Updated 10 years ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆14Updated 7 years ago
- ☆62Updated 11 years ago
- Standalone Semanticizer☆32Updated 10 years ago
- Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic pr…☆69Updated 3 months ago
- TiMBL implements several memory-based learning algorithms.☆53Updated 2 weeks ago
- Barista is an open-source framework for concurrent speech processing.☆36Updated 11 years ago
- Fast Word Clustering Software☆78Updated 7 months ago
- Open-source tools for morphological tagging, segmentation and stemming.☆40Updated 6 years ago