[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆36Jun 7, 2025Updated 10 months ago
Alternatives and similar repositories for focus
Users that are interested in focus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆90Sep 12, 2024Updated last year
- ☆16Jun 14, 2024Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Jan 25, 2023Updated 3 years ago
- Code for Zero-Shot Tokenizer Transfer☆144Jan 14, 2025Updated last year
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Experiments for XLM-V Transformers Integeration☆13Feb 8, 2023Updated 3 years ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆95Oct 30, 2024Updated last year
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects☆23Jan 26, 2025Updated last year
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆66Oct 25, 2024Updated last year
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages☆106Updated this week
- This repository contains an extension of fairseq for pixel / visual representations of text for machine translation.☆37Feb 2, 2024Updated 2 years ago
- AVocaDo : Strategy for Adapting Vocabulary to Downstream Domain☆23May 31, 2022Updated 3 years ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- Contains code used to conduct experiments on dependency parsing with the Tensor-LSTM model developed for our paper "Cross-Lingual Depende…☆13Jan 5, 2017Updated 9 years ago
- ☆11Mar 15, 2024Updated 2 years ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- Repository of PIXAR, a Pixel-based Auto-Regressive Language Model☆18Sep 15, 2025Updated 7 months ago
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆197Mar 27, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Named Entity Recognition in Nepali Language☆10Jan 12, 2023Updated 3 years ago
- MAIR: A Massive Benchmark for Evaluating Instructed Retrieval. Evaluate your retrieval models on 126 diverse tasks. [EMNLP 2024]☆24Nov 3, 2024Updated last year
- Scripture Forge: Collaborative translation with suggestions and community Scripture checking; all connected to Paratext☆11Updated this week
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆97Aug 15, 2023Updated 2 years ago
- suffix array construction and searching algorithms for in-memory binary data.☆12Sep 10, 2022Updated 3 years ago
- audio, NLP, ML with huggingface, nvidia/nemo, speechbrain☆11Sep 4, 2023Updated 2 years ago
- Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“☆15Jun 13, 2023Updated 2 years ago
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A scalable solution that simplifies the integration of ComfyUI for developers☆11Jul 15, 2024Updated last year
- A Smalltalk Web Browser for Squeak/Smalltalk☆17Apr 18, 2022Updated 3 years ago
- From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks☆15Feb 23, 2023Updated 3 years ago
- The MWE identification system, MTLB-STRUCT, participated in the PARSEME 1.2 Shared Task on semi-supervised identification of verbal multi…☆14Mar 11, 2024Updated 2 years ago
- Python module to remove wiki markup text.☆10Jan 15, 2016Updated 10 years ago
- EMNLP2022 "Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment"☆19Feb 19, 2023Updated 3 years ago
- ☆13May 23, 2024Updated last year