semantic-systems / amharic-qa
AmQA - The first Amharic Open Domain Question Answering Dataset
☆12Updated 8 months ago
Alternatives and similar repositories for amharic-qa:
Users that are interested in amharic-qa are comparing it to the libraries listed below
- notebooks to finetune `bert-small-amharic`, `bert-mini-amharic`, and `xlm-roberta-base` models using an Amharic text classification datas…☆10Updated 9 months ago
- Different semantic models for Amharic☆17Updated last year
- ☆108Updated last year
- ☆44Updated 3 years ago
- MAFAND-MT☆55Updated 7 months ago
- A Multilingual Replicable Instruction-Following Model☆94Updated last year
- ☆11Updated 7 months ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆37Updated 2 years ago
- PyTorch implementation of NMT models along with custom tokenizers, models, and datasets☆20Updated 2 years ago
- ☆17Updated 2 years ago
- This repositary hosts my experiments for the project, I did with OffNote Labs.☆10Updated 3 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 2 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆99Updated 10 months ago
- ☆12Updated 4 years ago
- Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Lo…☆39Updated last year
- NTREX -- News Test References for MT Evaluation☆81Updated 8 months ago
- PALI: Language identification for Perso-Arabic Scripts☆9Updated last year
- This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020…☆33Updated 3 years ago
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects☆20Updated 3 weeks ago
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆98Updated 2 months ago
- Code of NAACL 2022 "Efficient Hierarchical Domain Adaptation for Pretrained Language Models" paper.☆32Updated last year
- ☆44Updated 3 months ago
- MultiOCR, an interface that connects multiple open-source OCR and various Cloud OCR.☆31Updated last year
- LAReQA is a challenging benchmark for evaluating language agnostic answer retrieval from a multilingual candidate pool. This repository c…☆14Updated 4 years ago
- ☆38Updated 2 years ago
- ☆24Updated 2 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆26Updated 3 years ago
- Reduce the size of pretrained Hugging Face models via vocabulary trimming.☆43Updated 2 years ago
- Code for ProtAugment: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning☆21Updated 2 years ago
- Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"☆66Updated 3 years ago