mpacula / AutoCorpusLinks
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 13 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below
Sorting:
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 3 years ago
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 7 years ago
- Generalized Language Modeling toolkit☆51Updated 3 years ago
- Speech Processing & Linguistic Analysis Tool☆11Updated 6 years ago
- The Kyoyo Language Modeling Toolkit☆27Updated 11 years ago
- NLTK Contrib☆168Updated last year
- Visualization for hidden Markov model computations☆14Updated 10 years ago
- This is EllaVator project to build Ella the talking eleVator as part of a Saarland University software project class.☆17Updated 9 years ago
- http://www.ark.cs.cmu.edu/ARKref/☆32Updated 11 years ago
- Uses a distributed word representation to finds words along the hyperchord of two input words.☆102Updated 5 years ago
- Turbo topics find significant multiword phrases in topics.☆46Updated 10 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- Excitement Open Platform for Recognizing Textual Entailments☆88Updated 8 years ago
- Fast Word Clustering Software☆79Updated 9 months ago
- Grapheme to phoneme toolkit using joint-modelling + CRFs in java☆14Updated 7 years ago
- A Combinatory Categorial Grammar library.☆22Updated 12 years ago
- A visualizer for multi-dimensional semantic data☆38Updated 14 years ago
- bilingual dictionary extractor from parallel corpora☆22Updated 11 years ago
- The Community-enRiched Open WordNet (CROWN)☆18Updated 9 years ago
- Phonetic and phonological vocoding platform☆16Updated 9 years ago
- DKPro WSD: A Java framework for word sense disambiguation☆20Updated 3 years ago
- pronunciation LEXicons for Any Low-resource Language☆21Updated 5 years ago
- ThoughtTreasure commonsense knowledge base and architecture for natural language processing☆79Updated 10 years ago
- Speech act classifier for text based on Stanford CoreNLP and Weka☆35Updated 10 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 7 years ago
- Barista is an open-source framework for concurrent speech processing.☆36Updated 11 years ago
- Pitman-Yor processes in python☆26Updated 11 years ago
- English Dependency Relationship Extractor☆86Updated last month
- Hierarchical phrase-based machine translation system☆32Updated 10 years ago
- Topic Model Analyzer☆62Updated 10 years ago