masakhane-io / masakhane-news
MasakhaNEWS: News Topic Classification for African Languages
☆16Updated 4 months ago
Related projects: ⓘ
- MAFAND-MT☆52Updated 2 months ago
- Crosslingual Question Answering for African Languages☆27Updated 2 months ago
- ☆22Updated last year
- ☆16Updated last year
- Using short models to classify long texts☆20Updated last year
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆12Updated last year
- Ranking of fine-tuned HF models as base models.☆35Updated last year
- ☆22Updated 2 years ago
- Experiments for XLM-V Transformers Integeration☆13Updated last year
- Semantically Structured Sentence Embeddings☆65Updated 10 months ago
- A package for fine tuning of pretrained NLP transformers using Semi Supervised Learning☆15Updated 2 years ago
- ☆29Updated last year
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated 8 months ago
- ☆51Updated last year
- triple-encoders is a library for contextualizing distributed Sentence Transformers representations.☆12Updated 2 weeks ago
- This repositary hosts my experiments for the project, I did with OffNote Labs.☆11Updated 3 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆69Updated 6 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆51Updated 3 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 2 years ago
- ☆17Updated 2 months ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆10Updated 7 months ago
- Embedding Recycling for Language models☆38Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆42Updated 10 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆96Updated 4 months ago
- ☆19Updated last year
- ☆15Updated last month
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆11Updated 9 months ago
- Observe the slow deterioration of my mental sanity in the github commit history☆13Updated last year
- Training a model without a dataset for natural language inference (NLI)☆25Updated 4 years ago
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects☆16Updated 7 months ago