josecannete/spanish-corpora

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/josecannete/spanish-corpora)

josecannete / spanish-corpora

Unannotated Spanish 3 Billion Words Corpora

☆105

Alternatives and similar repositories for spanish-corpora

Users that are interested in spanish-corpora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

crscardellino / sbwce
View on GitHub
Spanish Billion Word Corpus and Embeddings
☆54Dec 16, 2022Updated 3 years ago
dccuchile / beto
View on GitHub
BETO - Spanish version of the BERT model
☆502Oct 21, 2023Updated 2 years ago
dccuchile / GLUES
View on GitHub
Resources for GLUE benchmark in Spanish
☆15Mar 29, 2021Updated 5 years ago
jorgeortizfuentes / spanish_nlp
View on GitHub
☆43Apr 26, 2025Updated last year
OpenCENIA / Spanish-Sentence-Evaluation
View on GitHub
Benchmarks for Evaluating Spanish Language Models
☆11Jun 14, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
aitoralmeida / spanish_word2vec
View on GitHub
Ready to use Spanish Word2Vec embeddings created from >18B chars and >3B words
☆44Jun 22, 2019Updated 7 years ago
dccuchile / lightweight-spanish-language-models
View on GitHub
ALBETO and DistilBETO are versions of ALBERT and DistilBERT pre-trained exclusively on Spanish corpora.
☆39Feb 7, 2023Updated 3 years ago
jacksonllee / pylangacq
View on GitHub
Language Acquisition Research Tools
☆45May 19, 2026Updated 2 months ago
rdenadai / BR-BERTo
View on GitHub
Transformer model for Portuguese language (Brazil pt_BR)
☆16Jul 13, 2026Updated 2 weeks ago
Linguistic-Data-Consortium / ldc-bpcsad
View on GitHub
A speech activity detector using HMMs
☆11Feb 11, 2026Updated 5 months ago
dccuchile / rivertext
View on GitHub
RiverText is a framework that standardizes the Incremental Word Embeddings proposed in the state-of-art. Please feel welcome to open an i…
☆24Feb 26, 2025Updated last year
dccuchile / wefe
View on GitHub
WEFE: The Word Embeddings Fairness Evaluation Framework. WEFE is a framework that standardizes the bias measurement and mitigation in Wor…
☆181Nov 24, 2025Updated 8 months ago
ialab-puc / CuratorNet
View on GitHub
CuratorNet: Visually-aware Recommendation of Art Images
☆13Dec 14, 2021Updated 4 years ago
psoulos / role-decomposition
View on GitHub
☆11Feb 11, 2020Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
EducationalTestingService / gug-data
View on GitHub
A dataset of sentences with ordinal labels for grammaticality
☆29Jun 9, 2014Updated 12 years ago
PlanTL-GOB-ES / lm-spanish
View on GitHub
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).
☆265Jul 27, 2023Updated 3 years ago
cldf-clts / clts
View on GitHub
Cross-Linguistic Transcription Systems
☆17Mar 20, 2026Updated 4 months ago
jhasegaw / phonecodes
View on GitHub
python code for converting among IPA, ARPABET, XSAMPA, Callhome, DISC, TIMIT, plus some lexical tones.
☆44Jun 18, 2026Updated last month
wenkokke / dep2con
View on GitHub
several algorithms for converting dependency structures into constituency structures.
☆10Feb 7, 2022Updated 4 years ago
matthewmorrone / cmudict-ipa
View on GitHub
CMU dictionary in IPA instead of their subset of Arpabet
☆16Jun 21, 2026Updated last month
dcavar / spaCy-JSON-NLP
View on GitHub
spaCy wrapper for JSON-NLP.
☆12Aug 11, 2019Updated 6 years ago
fabianoluzbr / neural-g2p-portuguese
View on GitHub
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly es…
☆19Jun 14, 2021Updated 5 years ago
zjlww / dsp
View on GitHub
Digital Speech Processing in PyTorch.
☆15Aug 12, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
mborsdorf / UniversalSpeakerExtraction
View on GitHub
☆15Sep 6, 2021Updated 4 years ago
igormq / ctcdecode-pytorch
View on GitHub
Python implementation of CTC beam search decoder + agnostic LM scorer
☆20Dec 16, 2020Updated 5 years ago
ljuvela / GELP
View on GitHub
☆27Apr 21, 2021Updated 5 years ago
guillaume-be / SentencePiece-Rust-example
View on GitHub
Supporting example for "A Rust SentencePiece implementation"
☆20Jun 7, 2020Updated 6 years ago
lociko / ukraine_itn_wfst
View on GitHub
Simple WFST for Ukrainian ITN based on NVIDIA NeMo and Pynini
☆19Oct 21, 2025Updated 9 months ago
arysin / nlp_uk_api
View on GitHub
☆11Oct 19, 2024Updated last year
avijit-thawani / SWOW-eval
View on GitHub
Intrinsic Evaluation of pre-trained word embeddings, using large Word Association Dataset: SWOW (Small World of Words)
☆11Feb 28, 2024Updated 2 years ago
delph-in / erg
View on GitHub
English Resource Grammar
☆30May 22, 2026Updated 2 months ago
gpu-poor / gramvaani_hindi_asr
View on GitHub
This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge
☆16Mar 26, 2022Updated 4 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ialab-puc / cluster
View on GitHub
Cluster IALAB. Documentación, scripts, archivos de configuración.
☆20Jul 5, 2024Updated 2 years ago
symanto-research / few-shot-learning-label-tuning
View on GitHub
A few-shot learning method based on siamese networks.
☆28Feb 20, 2023Updated 3 years ago
pywirrarika / naki
View on GitHub
List of research and engineering of NLP for American Native/Indigenous Languages.
☆94Nov 23, 2020Updated 5 years ago
stylerw / thymedata
View on GitHub
This is a repository for annotation data for the THYME Project, a clinical natural language processing project dedicated to extracting us…
☆36Jun 1, 2026Updated last month
amazon-science / proteno
View on GitHub
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…
☆45May 25, 2021Updated 5 years ago
MiniXC / phones
View on GitHub
A collection of utilities for handling IPA phones.
☆27Sep 24, 2023Updated 2 years ago
remyang55 / tranzlate
View on GitHub
Automatic transcription and translation for Zoom meetings
☆13Sep 7, 2020Updated 5 years ago