neocl / speach
🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)
☆18Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for speach
- Unicode Standard tokenization routines and orthography profile segmentation☆33Updated 2 years ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆22Updated this week
- A tiny BERT for low-resource monolingual models☆29Updated last month
- Finite-state script normalization and processing utilities☆38Updated this week
- List of corpora annotated for coreference for different languages☆17Updated 3 months ago
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- Proposed splits for the LREC Wikipron paper☆13Updated 4 years ago
- Multilingual Open Text☆25Updated 3 weeks ago
- Python Finite-State Toolkit☆45Updated last week
- Breaks a word into syllables using an LSTM-based neural network.☆19Updated last year
- Suite for phonetic word embeddings, especially their evaluation and baseline models.☆24Updated 3 weeks ago
- MAGPIE: A sense-annotated corpus of potentially idiomatic expressions☆25Updated 4 years ago
- The EveryVoice TTS Toolkit - Text To Speech for your language☆21Updated this week
- Forced Alignments for Common Voice☆31Updated 4 years ago
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)☆63Updated last year
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆67Updated 6 months ago
- Gamma Agreement in Python☆43Updated 8 months ago
- python package for calculating famous measures in computational linguistics☆13Updated 2 weeks ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆72Updated last year
- A python module to process data for Frame Semantic Parsing☆23Updated 4 years ago
- Scripts to create speech corpora from open.bible☆12Updated 2 years ago
- A guide to building language technology in new languages.☆57Updated 2 years ago
- Bilingual sentence similarity classifier using Tensorflow☆19Updated 5 years ago
- Python framework for processing Universal Dependencies data☆57Updated this week
- Bicleaner fork that uses neural networks☆38Updated 3 months ago
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Updated 5 years ago
- phone inventory library☆15Updated last year
- MaSS - Multilingual corpus of Sentence-aligned Spoken utterances☆48Updated 2 months ago
- Transform TMX to text☆29Updated last year
- ☆19Updated 3 years ago