[NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
β18Nov 26, 2023Updated 2 years ago
Alternatives and similar repositories for ofa
Users that are interested in ofa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β10Sep 13, 2022Updated 3 years ago
- [ACL 2025] π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated last year
- #μΈκΆμ½νΌμ€β31Oct 6, 2023Updated 2 years ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.β90Sep 12, 2024Updated last year
- β10Dec 28, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NeurIPS 2024] πΈ GlotCC Dataset and Piplineβ20Apr 6, 2025Updated last year
- β23Oct 30, 2023Updated 2 years ago
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Modelsβ11Jan 19, 2024Updated 2 years ago
- Difference-based Contrastive Learning for Korean Sentence Embeddingsβ23Mar 11, 2026Updated last month
- [WWW 2026] πΈ GlotWeb: Web Indexing for Minority Languagesβ17Apr 14, 2026Updated 2 weeks ago
- [LREC 2024] π Resource and Tool for Writing System Identificationβ21Mar 29, 2026Updated last month
- β10Dec 17, 2020Updated 5 years ago
- β15Mar 8, 2024Updated 2 years ago
- [EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"β36Jun 7, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for Zero-Shot Tokenizer Transferβ144Jan 14, 2025Updated last year
- β36Oct 4, 2023Updated 2 years ago
- β17Dec 16, 2022Updated 3 years ago
- Google 곡μ Rouge Implementationμ νκ΅μ΄μμ μ¬μ©ν μ μλλ‘ μ²λ¦¬β17Jan 3, 2024Updated 2 years ago
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"β26Jun 2, 2021Updated 4 years ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervisionβ95Oct 30, 2024Updated last year
- [EMNLP 2023] π¬ Language Identification with Support for More Than 2000 Labelsβ200Apr 15, 2026Updated 2 weeks ago
- PathPiece tokenizerβ14Nov 10, 2024Updated last year
- Bias, Hate classification with KoELECTRA πΏβ27Jun 12, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tranβ¦β18Oct 18, 2022Updated 3 years ago
- A python library for easily querying morphological inflection models trained on Unimorphβ13Oct 23, 2022Updated 3 years ago
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialectsβ23Jan 26, 2025Updated last year
- KLUE Benchmark 1st place (2021.12) solutions. (RE, MRC, NLI, STS, TC)β25Apr 11, 2022Updated 4 years ago
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper heβ¦β28Aug 8, 2025Updated 8 months ago
- TPUμμ νκ΅μ΄μ© LLM μΆλ‘ μ μν Jax/Flax ꡬν체μ λλ€.β12Jun 12, 2023Updated 2 years ago
- Enhaced version of Wikiextrator: A wikipedia dumps extractorβ28Sep 17, 2025Updated 7 months ago
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languagesβ106Apr 14, 2026Updated 2 weeks ago
- Getting interpretable dimensions in word embedding spaces.β15Jul 6, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- β61Jan 2, 2024Updated 2 years ago
- final-project-level3-nlp-02 created by GitHub Classroomβ11Dec 31, 2021Updated 4 years ago
- π Transformer Model for Lip Reading in the Wild (LRW) Benchmarkβ12Mar 18, 2023Updated 3 years ago
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paperβ14Aug 9, 2021Updated 4 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)β18May 10, 2023Updated 2 years ago
- MINERS βοΈ: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)β14Oct 3, 2024Updated last year
- β12Mar 17, 2026Updated last month