A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
β18Nov 26, 2023Updated 2 years ago
Alternatives and similar repositories for ofa
Users that are interested in ofa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β10Sep 13, 2022Updated 3 years ago
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated last year
- #μΈκΆμ½νΌμ€β31Oct 6, 2023Updated 2 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β90Sep 12, 2024Updated last year
- β10Dec 28, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- πΈ GlotCC Dataset and Pipline -- NeurIPS 2024β20Apr 6, 2025Updated last year
- β23Oct 30, 2023Updated 2 years ago
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsβ11Jan 19, 2024Updated 2 years ago
- Difference-based Contrastive Learning for Korean Sentence Embeddingsβ23Mar 11, 2026Updated last month
- πΈ GlotWeb: Web Indexing for Minority Languages (WWW 2026)β17Feb 27, 2026Updated last month
- π Resource and Tool for Writing System Identification (Unicode 17.0) -- LREC 2024β21Mar 29, 2026Updated 2 weeks ago
- β10Dec 17, 2020Updated 5 years ago
- β15Mar 8, 2024Updated 2 years ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β36Jun 7, 2025Updated 10 months ago
- NordVPN Special Discount Offer β’ AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Code for Zero-Shot Tokenizer Transferβ144Jan 14, 2025Updated last year
- β36Oct 4, 2023Updated 2 years ago
- β17Dec 16, 2022Updated 3 years ago
- Google 곡μ Rouge Implementationμ νκ΅μ΄μμ μ¬μ©ν μ μλλ‘ μ²λ¦¬β18Jan 3, 2024Updated 2 years ago
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"β26Jun 2, 2021Updated 4 years ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervisionβ96Oct 30, 2024Updated last year
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β197Mar 27, 2026Updated 2 weeks ago
- PathPiece tokenizerβ14Nov 10, 2024Updated last year
- Bias, Hate classification with KoELECTRA πΏβ27Jun 12, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tranβ¦β18Oct 18, 2022Updated 3 years ago
- A python library for easily querying morphological inflection models trained on Unimorphβ13Oct 23, 2022Updated 3 years ago
- KLUE Benchmark 1st place (2021.12) solutions. (RE, MRC, NLI, STS, TC)β25Apr 11, 2022Updated 4 years ago
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper heβ¦β28Aug 8, 2025Updated 8 months ago
- TPUμμ νκ΅μ΄μ© LLM μΆλ‘ μ μν Jax/Flax ꡬν체μ λλ€.β12Jun 12, 2023Updated 2 years ago
- Enhaced version of Wikiextrator: A wikipedia dumps extractorβ28Sep 17, 2025Updated 6 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023β106Apr 20, 2024Updated last year
- β61Jan 2, 2024Updated 2 years ago
- Getting interpretable dimensions in word embedding spaces.β15Jul 6, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- final-project-level3-nlp-02 created by GitHub Classroomβ11Dec 31, 2021Updated 4 years ago
- Repository of PIXAR, a Pixel-based Auto-Regressive Language Modelβ18Sep 15, 2025Updated 6 months ago
- π Transformer Model for Lip Reading in the Wild (LRW) Benchmarkβ12Mar 18, 2023Updated 3 years ago
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paperβ14Aug 9, 2021Updated 4 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)β18May 10, 2023Updated 2 years ago
- MINERS βοΈ: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)β14Oct 3, 2024Updated last year
- AutoRAG example about benchmarking Korean embeddings.β44Oct 2, 2024Updated last year