π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
β11Apr 6, 2025Updated 10 months ago
Alternatives and similar repositories for MEXA
Users that are interested in MEXA are comparing it to the libraries listed below
Sorting:
- πΈ GlotWeb: Web Indexing for Minority Languages (WWW 2026)β17Updated this week
- πΈ GlotCC Dataset and Pipline -- NeurIPS 2024β20Apr 6, 2025Updated 10 months ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networksβ12Nov 9, 2021Updated 4 years ago
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific wayβ18Nov 4, 2025Updated 4 months ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"β13Dec 14, 2021Updated 4 years ago
- β15Mar 8, 2024Updated last year
- Can LLMs generate code-mixed sentences through zero-shot prompting?β11Apr 18, 2023Updated 2 years ago
- SCT: An Efficient Self-Supervised Cross-View Training For Sentence Embedding (TACL)β16Jul 27, 2024Updated last year
- Minimal code to train ELMo models in recent versions of TensorFlowβ14Apr 30, 2023Updated 2 years ago
- Curriculum trainingβ22Jun 25, 2025Updated 8 months ago
- Code base for the EMNLP 2021 Findings paper: Cartography Active Learningβ14Jun 3, 2025Updated 9 months ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretrainingβ18Nov 26, 2023Updated 2 years ago
- π Resource and Tool for Writing System Identification (Unicode 17.0) -- LREC 2024β21Feb 17, 2026Updated 2 weeks ago
- β21Dec 30, 2022Updated 3 years ago
- A extension of Transformers library to include T5ForSequenceClassification class.β40Apr 17, 2023Updated 2 years ago
- Temporary remove unused tokens during training to save ram and speed.β23Jun 15, 2025Updated 8 months ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β26Feb 16, 2026Updated 2 weeks ago
- PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tranβ¦β18Oct 18, 2022Updated 3 years ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"β22Feb 14, 2024Updated 2 years ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/β¦β28Apr 17, 2024Updated last year
- The geometry of multilingual language model representations (EMNLP 2022).β22Oct 21, 2022Updated 3 years ago
- β44Feb 11, 2026Updated 2 weeks ago
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialectsβ23Jan 26, 2025Updated last year
- Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"β26Jun 3, 2025Updated 9 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023β106Apr 20, 2024Updated last year
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper heβ¦β27Aug 8, 2025Updated 6 months ago
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learningβ30Jan 25, 2023Updated 3 years ago
- A collection of notebooks for Natural Language Processingβ25Jan 13, 2025Updated last year
- Official code release for "SuperBPE: Space Travel for Language Models"β89Jan 9, 2026Updated last month
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"β28Oct 3, 2021Updated 4 years ago
- Code for "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition"β32Jun 20, 2023Updated 2 years ago
- ππ€ A collection of templates for Hugging Face Spacesβ35Oct 9, 2023Updated 2 years ago
- Eh, simple and works.β27Dec 9, 2023Updated 2 years ago
- Finite-state script normalization and processing utilitiesβ46Updated this week
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"β30Apr 2, 2022Updated 3 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)β74Apr 1, 2025Updated 11 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β35Aug 15, 2023Updated 2 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)β34Aug 6, 2023Updated 2 years ago
- [NeurIPS 2023] Code base for the Renyi Kernel Entropy (RKE) metric for generative models.β13Jun 18, 2025Updated 8 months ago