josecannete / spanish-corporaView external linksLinks
Unannotated Spanish 3 Billion Words Corpora
☆104Oct 20, 2022Updated 3 years ago
Alternatives and similar repositories for spanish-corpora
Users that are interested in spanish-corpora are comparing it to the libraries listed below
Sorting:
- BETO - Spanish version of the BERT model☆499Oct 21, 2023Updated 2 years ago
- Spanish Billion Word Corpus and Embeddings☆52Dec 16, 2022Updated 3 years ago
- ☆43Apr 26, 2025Updated 9 months ago
- Spanish word embeddings computed with different methods and from different corpora☆364Oct 9, 2019Updated 6 years ago
- Benchmarks for Evaluating Spanish Language Models☆11Jun 14, 2023Updated 2 years ago
- ☆11Feb 11, 2020Updated 6 years ago
- My replication code for the AlexNet paper.☆14Nov 14, 2022Updated 3 years ago
- ALBETO and DistilBETO are versions of ALBERT and DistilBERT pre-trained exclusively on Spanish corpora.☆40Feb 7, 2023Updated 3 years ago
- Spanish rule-based lemmatization for spaCy☆40Apr 19, 2022Updated 3 years ago
- CuratorNet: Visually-aware Recommendation of Art Images☆13Dec 14, 2021Updated 4 years ago
- A corpus of speech from the Joe Rogan Experience podcast, consisting of 8.43 million words. It includes aligned TextGrids with phonetic a…☆21Jan 26, 2020Updated 6 years ago
- Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly es…☆19Jun 14, 2021Updated 4 years ago
- Python implementation of CTC beam search decoder + agnostic LM scorer☆20Dec 16, 2020Updated 5 years ago
- Language Acquisition Research Tools☆43Nov 16, 2025Updated 3 months ago
- Esto es un clip.☆21Jan 23, 2023Updated 3 years ago
- A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on github…☆14Dec 19, 2022Updated 3 years ago
- Factorization of the neural parameter space for zero-shot multi-lingual and multi-task transfer☆39Sep 22, 2020Updated 5 years ago
- Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).☆262Jul 27, 2023Updated 2 years ago
- WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework that standardizes the bias measurement and mitigation in Wor…☆182Nov 24, 2025Updated 2 months ago
- ☆11Oct 19, 2024Updated last year
- VoxAngeles Corpus☆13Aug 23, 2025Updated 5 months ago
- PyTorch implementation of TinyWASE described in our paper "Compressing Speaker Extraction Model with Ultra-low Precision Quantization and…☆11Jun 28, 2021Updated 4 years ago
- A corpus of diacritized Hebrew texts (טקסט מנוקד)☆11May 4, 2022Updated 3 years ago
- My public domain speech index☆13Sep 19, 2019Updated 6 years ago
- several algorithms for converting dependency structures into constituency structures.☆10Feb 7, 2022Updated 4 years ago
- ☆10Mar 20, 2021Updated 4 years ago
- LLM-aided data filtering☆14Dec 3, 2024Updated last year
- Ready to use Spanish Word2Vec embeddings created from >18B chars and >3B words☆44Jun 22, 2019Updated 6 years ago
- A collection of utilities for handling IPA phones.☆26Sep 24, 2023Updated 2 years ago
- Un generador de nombres de poblaciones usando una red neuronal LSTM☆14Mar 24, 2023Updated 2 years ago
- This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge☆15Mar 26, 2022Updated 3 years ago
- Simple Kaldi recipe for forced alignment☆11Jul 16, 2023Updated 2 years ago
- Thai Grapheme to Phoneme (G2P) Wiktionary Corpus☆13Jul 25, 2022Updated 3 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- ☆26Apr 21, 2021Updated 4 years ago
- An implementation of the Wav2Letter Speech-to-Text model using PyTorch.☆14Mar 8, 2023Updated 2 years ago
- Digital Speech Processing in PyTorch.☆15Aug 12, 2022Updated 3 years ago
- Cross-Linguistic Transcription Systems☆17Dec 17, 2024Updated last year
- Annotations and scripts for use with University of Wisconsin X-Ray Microbeam Speech Production Database (1994)☆13Oct 8, 2020Updated 5 years ago