SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
☆24May 20, 2026Updated last week
Alternatives and similar repositories for sib-200
Users that are interested in sib-200 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Crosslingual Question Answering for African Languages☆31Sep 27, 2024Updated last year
- ☆12Mar 7, 2022Updated 4 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyond☆13Aug 15, 2022Updated 3 years ago
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages☆106Apr 14, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/…☆39Jul 31, 2025Updated 9 months ago
- Scripts to create speech corpora from open.bible☆13Jan 3, 2022Updated 4 years ago
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".☆18Sep 17, 2021Updated 4 years ago
- ☆15Mar 8, 2024Updated 2 years ago
- This is a repository for NaijaSenti. A Lacuna Funded Project for the development of sentiment corpus for four Nigerian languages: Igbo, H…☆38Oct 14, 2025Updated 7 months ago
- MAFAND-MT☆62Jul 9, 2024Updated last year
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"☆36Jun 7, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆272Aug 1, 2025Updated 9 months ago
- PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tran…☆18Oct 18, 2022Updated 3 years ago
- A web crawler to crawl Best Global University Ranking on usnews, Times Higher Education, and QS websites☆13Dec 31, 2025Updated 4 months ago
- SeqScore: Scoring for named entity recognition and other sequence labeling tasks☆23Mar 30, 2026Updated last month
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper he…☆28Aug 8, 2025Updated 9 months ago
- Hausa-NMT: Empirical Study of Neural Machine translation for English-Hausa-English☆17Oct 20, 2020Updated 5 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆90Sep 12, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Repository of PIXAR, a Pixel-based Auto-Regressive Language Model☆19Sep 15, 2025Updated 8 months ago
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- Curate online wolof text resources that can be used to build models☆28May 11, 2026Updated 2 weeks ago
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆26May 13, 2026Updated 2 weeks ago
- Named Entity Recognition in Nepali Language☆10Jan 12, 2023Updated 3 years ago
- Meta Representation Transformation for Low-resource Cross-lingual Learning☆42May 5, 2021Updated 5 years ago
- 复现 Soft-Masked BERT, 论文 Spelling Error Correction with Soft-Masked BERT☆12Oct 14, 2020Updated 5 years ago
- Code and data for "Heterogeneous Supervised Topic Models"☆10Jun 27, 2022Updated 3 years ago
- Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“☆15Jun 13, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆121Oct 15, 2025Updated 7 months ago
- Package to align tokens from different tokenizations.☆16Mar 25, 2024Updated 2 years ago
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- Contains code used to conduct experiments on dependency parsing with the Tensor-LSTM model developed for our paper "Cross-Lingual Depende…☆13Jan 5, 2017Updated 9 years ago
- scipts for working with open.bible data☆26Jan 24, 2022Updated 4 years ago
- Experiments for XLM-V Transformers Integeration☆13Feb 8, 2023Updated 3 years ago
- ☆14Apr 16, 2024Updated 2 years ago