castorini / afriberta
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages
☆66Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for afriberta
- ☆105Updated 11 months ago
- Crosslingual Question Answering for African Languages☆29Updated last month
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆93Updated 6 months ago
- MAFAND-MT☆55Updated 4 months ago
- ☆16Updated last year
- MasakhaNEWS: News Topic Classification for African Languages☆18Updated 6 months ago
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆58Updated 2 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 3 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆151Updated last year
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Updated 5 months ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆28Updated last year
- ☆42Updated last year
- AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/☆46Updated 10 months ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 2 years ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆92Updated 3 weeks ago
- All my experiments with the various transformers and various transformer frameworks available☆14Updated 3 years ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆31Updated 10 months ago
- An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.☆37Updated last year
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 6 months ago
- Instruction dataset for Arabic with 10,000 instruction and output pairs. CIDAR can be used to fine-tune LLMs to follow instructions.☆33Updated 9 months ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆110Updated 2 years ago
- German small and large versions of GPT2.☆20Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆62Updated 8 months ago
- AraT5: Text-to-Text Transformers for Arabic Language Understanding☆85Updated 6 months ago
- Shoonya - Platform to Annotate and label data at scale.☆50Updated 2 months ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆40Updated 2 years ago
- Python-based implementation of the Translate-Align-Retrieve method to automatically translate the SQuAD Dataset to Spanish.☆59Updated last year
- ☆40Updated last year
- An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets☆31Updated 10 months ago