castorini / afriberta
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages
☆69Updated 2 years ago
Alternatives and similar repositories for afriberta:
Users that are interested in afriberta are comparing it to the libraries listed below
- ☆108Updated last year
- Crosslingual Question Answering for African Languages☆29Updated 5 months ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆102Updated 10 months ago
- MAFAND-MT☆55Updated 7 months ago
- ☆17Updated 2 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆76Updated last year
- MasakhaNEWS: News Topic Classification for African Languages☆19Updated 9 months ago
- This is a neural spell checker☆65Updated 2 years ago
- AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/☆48Updated last year
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆118Updated 3 months ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆151Updated 2 years ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆31Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆31Updated last year
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆39Updated 2 years ago
- ☆42Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 9 months ago
- A spaCy custom component that extracts and normalizes temporal expressions☆54Updated 2 years ago
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Updated 8 months ago
- SpaCyEx allows the creation of spaCy Matcher patterns with RegEx like syntax.☆59Updated 10 months ago
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.☆71Updated 2 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 3 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆86Updated last month
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…☆25Updated 2 years ago
- Data, Embeddings, Stopword lists, code, and baselines for COLING 2020 paper titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text …☆12Updated 10 months ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆77Updated 5 months ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆37Updated 2 years ago
- Scripts to create speech corpora from open.bible☆13Updated 3 years ago