castorini / afribertaLinks
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages
☆80Updated 3 years ago
Alternatives and similar repositories for afriberta
Users that are interested in afriberta are comparing it to the libraries listed below
Sorting:
- MAFAND-MT☆60Updated last year
- ☆116Updated 3 months ago
- A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.☆112Updated last year
- Crosslingual Question Answering for African Languages☆30Updated last year
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆41Updated 2 years ago
- ☆17Updated 3 years ago
- Some notebooks for NLP☆206Updated 2 years ago
- A library to synthesize text datasets using Large Language Models (LLM)☆152Updated 3 years ago
- Shoonya - Platform to Annotate and label data at scale.☆64Updated 2 months ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆96Updated 2 years ago
- Scripts to convert datasets from various sources to Hugging Face Datasets.☆57Updated 3 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆87Updated last year
- ☆10Updated last year
- indicTranslate v1 - Machine Translation for 11 Indic languages. For latest v2, check: https://github.com/AI4Bharat/IndicTrans2☆134Updated 2 years ago
- TorchServe+Streamlit for easily serving your HuggingFace NER models☆33Updated 3 years ago
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 3 years ago
- Open information and community for machine translation☆80Updated 2 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆156Updated last year
- 💫 SpaCy wrapper for ConceptNet 💫☆95Updated 3 weeks ago
- Pre-trained, multilingual sequence-to-sequence models for Indian languages☆51Updated 3 years ago
- A collection of preprocessed datasets and pretrained models for generating paraphrases.☆32Updated 4 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- Tools for managing datasets for governance and training.☆87Updated last week
- A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs☆115Updated 2 years ago
- Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.☆105Updated 3 years ago
- NTREX -- News Test References for MT Evaluation☆87Updated last year
- Using short models to classify long texts☆21Updated 2 years ago
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 3 years ago
- Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters …☆76Updated 3 weeks ago
- Experiments for XLM-V Transformers Integeration☆13Updated 2 years ago