castorini / afriberta
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages
☆66Updated 2 years ago
Alternatives and similar repositories for afriberta:
Users that are interested in afriberta are comparing it to the libraries listed below
- ☆107Updated last year
- MAFAND-MT☆55Updated 6 months ago
- Crosslingual Question Answering for African Languages☆29Updated 4 months ago
- ☆17Updated 2 years ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆99Updated 9 months ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆111Updated 2 months ago
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆58Updated 2 years ago
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆86Updated 3 weeks ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- MasakhaNEWS: News Topic Classification for African Languages☆18Updated 8 months ago
- AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/☆48Updated last year
- A python package to augment text data using NLP.☆40Updated 9 months ago
- A library to synthesize text datasets using Large Language Models (LLM)☆151Updated 2 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 3 years ago
- This is a neural spell checker☆62Updated 2 years ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆31Updated last year
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆111Updated 2 years ago
- ☆14Updated 9 months ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆30Updated last year
- Some notebooks for NLP☆192Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated last year
- A spaCy custom component that extracts and normalizes temporal expressions☆52Updated last year
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆35Updated last year
- ☆51Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 10 months ago
- scipts for working with open.bible data☆24Updated 3 years ago
- Execute arbitrary SQL queries on 🤗 Datasets☆32Updated last year
- 📝An easy-to-use package to restore punctuation of the text.☆111Updated last year
- HF's ML for Audio study group☆191Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 8 months ago