Softcatala / ca-text-corpus
Public domain corpus of Catalan text
☆16Updated 3 years ago
Alternatives and similar repositories for ca-text-corpus:
Users that are interested in ca-text-corpus are comparing it to the libraries listed below
- The curation repository for the data behind Concepticon.☆38Updated 2 months ago
- Tools and scripts for working with ELAN☆10Updated 2 years ago
- Apertium linguistic data for Catalan☆11Updated last week
- The Unicode Cookbook for Linguists☆53Updated 4 years ago
- A Python module for interfacing with the Treetagger by Helmut Schmid.☆75Updated 3 years ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆48Updated last year
- CLDF: Cross-Linguistic Data Formats - the specification☆57Updated last year
- Recipes for cooking with CLDF data☆17Updated 4 months ago
- VoxAngeles Corpus☆11Updated last year
- universal syllabification algorithms☆44Updated 2 years ago
- Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Plains Cree language☆16Updated this week
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Updated 5 years ago
- Official source for Catalan Language Models and resources made within Aina project.☆24Updated last year
- PHOIBLE Online☆42Updated 2 years ago
- eXtensible Interlinear Glossed Text☆33Updated 2 years ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆62Updated 2 weeks ago
- ☆32Updated 3 years ago
- Featurize words into orthographic and phonological vectors.☆40Updated last year
- ☆28Updated 3 weeks ago
- PHOIBLE data and development.☆125Updated 9 months ago
- Deepspeech ASR Model for the Catalan Language☆17Updated 4 years ago
- Study on lexibank data (presenting the lexibank dataset).☆12Updated 2 weeks ago
- SegBo: A database of borrowed sounds in the world’s languages☆16Updated last year
- Catalan ALBERT (A Lite BERT for self-supervised learning of language representations)☆14Updated 4 years ago
- Text-to-Speech conversor for Basque and Spanish. It includes linguistic processing and built voices for the languages aforementioned. Its…☆13Updated 11 months ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated last year
- A lexicon compiler for non-suffixational morphologies☆12Updated 2 weeks ago
- Austronesian Comparative Dictionary☆13Updated 3 months ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated last year
- Tool to collect and review sentences for Common Voice☆81Updated last year