castorini / afribertaLinks
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages
☆75Updated 3 years ago
Alternatives and similar repositories for afriberta
Users that are interested in afriberta are comparing it to the libraries listed below
Sorting:
- MAFAND-MT☆57Updated last year
- ☆110Updated last year
- Crosslingual Question Answering for African Languages☆31Updated 11 months ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆109Updated last year
- ☆17Updated 2 years ago
- Some notebooks for NLP☆207Updated last year
- A library to synthesize text datasets using Large Language Models (LLM)☆152Updated 2 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago
- Tools for managing datasets for governance and training.☆85Updated 2 weeks ago
- Command Line Interface for Hugging Face Inference Endpoints☆66Updated last year
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆83Updated 11 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆156Updated last year
- NeatText a simple NLP package for cleaning textual data and text preprocessing☆72Updated last year
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 2 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 3 years ago
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆61Updated 2 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆111Updated 3 years ago
- Machine translation (MT) benchmark dataset for languages in the Horn of Africa.☆40Updated 2 years ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆149Updated 2 months ago
- Pipeline for pulling and processing online language model pretraining data from the web☆177Updated 2 years ago
- NTREX -- News Test References for MT Evaluation☆85Updated last year
- A Python library aimed at dissecting and augmenting NER training data.☆58Updated 2 years ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 3 years ago
- Explainable Zero-Shot Topic Extraction☆63Updated last year
- Experiments for XLM-V Transformers Integeration☆13Updated 2 years ago
- ☆359Updated last year
- Accelerated NLP pipelines for fast inference on CPU. Built with Transformers and ONNX runtime.☆127Updated 4 years ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated 2 years ago
- ☆106Updated 8 months ago
- A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs☆116Updated 2 years ago