A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
β18Nov 26, 2023Updated 2 years ago
Alternatives and similar repositories for ofa
Users that are interested in ofa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β10Sep 13, 2022Updated 3 years ago
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated 11 months ago
- #μΈκΆμ½νΌμ€β31Oct 6, 2023Updated 2 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β90Sep 12, 2024Updated last year
- β10Dec 28, 2023Updated 2 years ago
- πΈ GlotCC Dataset and Pipline -- NeurIPS 2024β20Apr 6, 2025Updated 11 months ago
- β23Oct 30, 2023Updated 2 years ago
- Difference-based Contrastive Learning for Korean Sentence Embeddingsβ23Mar 11, 2026Updated last week
- β10Dec 17, 2020Updated 5 years ago
- Implementation for NeurIPS 2024 paper "SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models" (htβ¦β14Dec 23, 2024Updated last year
- β15Mar 8, 2024Updated 2 years ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β36Jun 7, 2025Updated 9 months ago
- Code for Zero-Shot Tokenizer Transferβ143Jan 14, 2025Updated last year
- β36Oct 4, 2023Updated 2 years ago
- β17Dec 16, 2022Updated 3 years ago
- Google 곡μ Rouge Implementationμ νκ΅μ΄μμ μ¬μ©ν μ μλλ‘ μ²λ¦¬β18Jan 3, 2024Updated 2 years ago
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"β26Jun 2, 2021Updated 4 years ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervisionβ96Oct 30, 2024Updated last year
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β191Mar 12, 2026Updated last week
- PathPiece tokenizerβ14Nov 10, 2024Updated last year
- β15Apr 15, 2024Updated last year
- Bias, Hate classification with KoELECTRA πΏβ27Jun 12, 2023Updated 2 years ago
- PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tranβ¦β18Oct 18, 2022Updated 3 years ago
- A python library for easily querying morphological inflection models trained on Unimorphβ13Oct 23, 2022Updated 3 years ago
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialectsβ23Jan 26, 2025Updated last year
- KLUE Benchmark 1st place (2021.12) solutions. (RE, MRC, NLI, STS, TC)β25Apr 11, 2022Updated 3 years ago
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper heβ¦β27Aug 8, 2025Updated 7 months ago
- TPUμμ νκ΅μ΄μ© LLM μΆλ‘ μ μν Jax/Flax ꡬν체μ λλ€.β12Jun 12, 2023Updated 2 years ago
- Enhaced version of Wikiextrator: A wikipedia dumps extractorβ28Sep 17, 2025Updated 6 months ago
- β59Jan 2, 2024Updated 2 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023β106Apr 20, 2024Updated last year
- Getting interpretable dimensions in word embedding spaces.β15Jul 6, 2023Updated 2 years ago
- final-project-level3-nlp-02 created by GitHub Classroomβ11Dec 31, 2021Updated 4 years ago
- Repository of PIXAR, a Pixel-based Auto-Regressive Language Modelβ18Sep 15, 2025Updated 6 months ago
- π Transformer Model for Lip Reading in the Wild (LRW) Benchmarkβ12Mar 18, 2023Updated 3 years ago
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paperβ14Aug 9, 2021Updated 4 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)β18May 10, 2023Updated 2 years ago
- MINERS βοΈ: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)β14Oct 3, 2024Updated last year
- AutoRAG example about benchmarking Korean embeddings.β43Oct 2, 2024Updated last year