mpacula / AutoCorpusLinks
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 13 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below
Sorting:
- Generalized Language Modeling toolkit☆51Updated 3 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- Compute the most likely permutation of a lattice given an LM☆10Updated 12 years ago
- EESEN based offline transcriber VM using models trained on TEDLIUM and Cantab Research☆49Updated 6 years ago
- This is EllaVator project to build Ella the talking eleVator as part of a Saarland University software project class.☆17Updated 9 years ago
- Barista is an open-source framework for concurrent speech processing.☆36Updated 11 years ago
- A visualizer for multi-dimensional semantic data☆38Updated 13 years ago
- pronunciation LEXicons for Any Low-resource Language☆21Updated 5 years ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆14Updated 7 years ago
- Top level code to transcribe English audio/video files into text/subtitles☆20Updated 7 years ago
- Utilities for manipulating finite state transducers with the OpenFst library.☆32Updated 8 years ago
- The Community-enRiched Open WordNet (CROWN)☆18Updated 9 years ago
- Visualization for hidden Markov model computations☆14Updated 10 years ago
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 7 years ago
- A simple toolkit for speaker segmentation and identification☆30Updated 12 years ago
- Phonetic and phonological vocoding platform☆16Updated 8 years ago
- NLTK Contrib☆166Updated last year
- Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipg…☆129Updated 9 months ago
- An Efficient Language Model Using Double-Array Structures☆17Updated 5 years ago
- Uses a distributed word representation to finds words along the hyperchord of two input words.☆102Updated 5 years ago
- Speech Processing & Linguistic Analysis Tool☆12Updated 6 years ago
- Language Modeling with Sum-Product Networks☆20Updated 11 years ago
- Unicode Text to IPA Converter☆21Updated 10 years ago
- DKPro WSD: A Java framework for word sense disambiguation☆20Updated 2 years ago
- NLP tools developed by Emory University.☆61Updated 9 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- CS224S Course Project☆14Updated 11 years ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump☆254Updated last year
- Zurich Morphological Lexicon for German: a tool to extract a morphological lexicon from Wiktionary☆11Updated 2 years ago
- A web application for exploring documents topically.☆26Updated 9 years ago