neocl / speach
ππ Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)
β17Updated 2 months ago
Related projects: β
- Python Finite-State Toolkitβ39Updated last month
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β65Updated 4 months ago
- Finite-state script normalization and processing utilitiesβ36Updated this week
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.β27Updated last year
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinkiβ21Updated this week
- These are lists for a variety of languages containing words that are distinctive to each language.β34Updated 2 years ago
- A survey of corpora for Germanic low-resource languages and dialectsβ24Updated last month
- ParaNames: A multilingual resource for parallel namesβ30Updated 4 months ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For instβ¦β19Updated 2 years ago
- Transform TMX to textβ29Updated last year
- Multilingual Open Textβ25Updated 5 months ago
- Curriculum trainingβ15Updated this week
- Code and data for the IWSLT 2022 shared task on Formality Control for SLTβ21Updated last year
- β19Updated 2 years ago
- β67Updated last month
- MAGPIE: A sense-annotated corpus of potentially idiomatic expressionsβ25Updated 4 years ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.β45Updated last week
- Tool to fix bitexts and tag near-duplicates for removalβ29Updated last month
- Bicleaner fork that uses neural networksβ37Updated last month
- Bilingual sentence similarity classifier using Tensorflowβ19Updated 4 years ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)β36Updated last year
- Multilingual syllable annotation pipeline component for spacyβ34Updated last year
- Small-vocabulary sequence-to-sequence generation with optional feature conditioningβ29Updated last week
- SIGMORPHON 2022 Shared Task on Morpheme Segmentationβ23Updated last year
- Gamma Agreement in Pythonβ43Updated 6 months ago
- MultiLexNorm 2021 competition system from ΓFALβ15Updated 2 years ago
- BERT models for many languages created from Wikipedia textsβ34Updated 4 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translationβ12Updated 3 weeks ago
- Corpus preprocessingβ95Updated 6 months ago
- Automatic extraction of edited sentences from text edition histories.β80Updated 2 years ago