neocl / speach
ππ Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)
β17Updated 7 months ago
Alternatives and similar repositories for speach:
Users that are interested in speach are comparing it to the libraries listed below
- Unicode Standard tokenization routines and orthography profile segmentationβ34Updated 2 years ago
- Proposed splits for the LREC Wikipron paperβ14Updated 4 years ago
- Python Finite-State Toolkitβ48Updated last week
- β22Updated 2 years ago
- List of corpora annotated for coreference for different languagesβ17Updated 5 months ago
- A tiny BERT for low-resource monolingual modelsβ31Updated 4 months ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.β72Updated last year
- A guide to building language technology in new languages.β58Updated 2 years ago
- β68Updated 5 months ago
- The Fisher and CALLHOME SpanishβEnglish Speech Translation Corpusβ39Updated 2 years ago
- Corpus preprocessingβ95Updated 10 months ago
- Supplementary material for "When and Why Are Pre-trained Word Embeddings Useful for Neural Machine Translation?" at NAACL 2018β121Updated 4 years ago
- β44Updated 6 months ago
- SHAS: Approaching optimal Segmentation for End-to-End Speech Translationβ37Updated last year
- Multilingual Open Textβ25Updated 3 months ago
- Forced Alignments for Common Voiceβ31Updated 4 years ago
- Universal Romanizer that can convert any unicode script to roman (latin) scriptβ169Updated 6 months ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instβ¦β22Updated 3 years ago
- Finite-state script normalization and processing utilitiesβ38Updated last week
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text toβ¦β42Updated 3 years ago
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)β64Updated 2 years ago
- BERT models for many languages created from Wikipedia textsβ34Updated 4 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentationβ24Updated last year
- Breaks a word into syllables using an LSTM-based neural network.β19Updated last year
- Gamma Agreement in Pythonβ43Updated 10 months ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinkiβ22Updated last month
- OpusFilter - Parallel corpus processing toolkitβ104Updated this week
- A phoneme-allophone database for many languagesβ48Updated 4 years ago
- phone inventory libraryβ16Updated last year
- A toolkit for producing n-gram language models. The highlights are the implementation of Kneser-Ney growing and revised Kneser pruning meβ¦β40Updated 4 months ago