mpacula / AutoCorpusLinks
AutoCorpus is a set of utilities that enable automatic extraction of language corpora and language models from publicly available datasets. Autocorpus utilities follow the Unix design philosophy and integrate easily into custom data processing pipelines.
☆37Updated 13 years ago
Alternatives and similar repositories for AutoCorpus
Users that are interested in AutoCorpus are comparing it to the libraries listed below
Sorting:
- A Recurrent Neural Network trained on all existing TED Talk Transcripts. The model outputs machine generated TED Talks.☆51Updated 7 years ago
- Generalized Language Modeling toolkit☆51Updated 3 years ago
- NLTK Contrib☆168Updated last year
- A visualizer for multi-dimensional semantic data☆38Updated 14 years ago
- Visualization for hidden Markov model computations☆14Updated 11 years ago
- Vector Space Model Framework developed for InPhO☆39Updated 8 months ago
- Uses a distributed word representation to finds words along the hyperchord of two input words.☆102Updated 5 years ago
- A JavaScript demo of some multi-armed bandits algorithms☆39Updated 5 years ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 3 years ago
- DKPro WSD: A Java framework for word sense disambiguation☆20Updated 3 years ago
- The Community-enRiched Open WordNet (CROWN)☆18Updated 10 years ago
- A simple toolkit for speaker segmentation and identification☆31Updated 12 years ago
- Compute association strength over semantic networks in a dimensionality-reduced form.☆32Updated 10 years ago
- Raplysaattori is a software used to detect rhymes and compute their lengths from English / Finnish rap lyrics.☆70Updated 7 years ago
- Barista is an open-source framework for concurrent speech processing.☆36Updated 11 years ago
- English Dependency Relationship Extractor☆86Updated 2 months ago
- A Python library for learning from dimensionality reduction, supporting sparse and dense matrices.☆78Updated 8 years ago
- Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic pr…☆70Updated 3 weeks ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- Speech Processing & Linguistic Analysis Tool☆11Updated 6 years ago
- An implementation of latent Dirichlet allocation in javascript☆185Updated 3 years ago
- The Kyoyo Language Modeling Toolkit☆27Updated 11 years ago
- NIST Language i-vector Machine Learning Challenge☆27Updated 9 years ago
- Excitement Open Platform for Recognizing Textual Entailments☆89Updated 8 years ago
- http://www.ark.cs.cmu.edu/ARKref/☆32Updated 11 years ago
- a port of the Wavenet algorithm to generate poems (using Samuel Graván's @Zeta36 code).☆36Updated 8 years ago
- rapid nlp prototyping☆71Updated 3 years ago
- Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump☆255Updated 2 years ago
- ThoughtTreasure commonsense knowledge base and architecture for natural language processing☆79Updated 10 years ago
- Parsito: Fast non-projective transition-based dependency parser☆14Updated last month