[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆36Jun 7, 2025Updated 11 months ago
Alternatives and similar repositories for focus
Users that are interested in focus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆90Sep 12, 2024Updated last year
- A library for language transfer methods and algorithms.☆16Feb 6, 2026Updated 3 months ago
- ☆16Jun 14, 2024Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Jan 25, 2023Updated 3 years ago
- Code for Zero-Shot Tokenizer Transfer☆144Jan 14, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Experiments for XLM-V Transformers Integeration☆13Feb 8, 2023Updated 3 years ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆95Oct 30, 2024Updated last year
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects☆24Updated this week
- Goldfish: Monolingual language models for 350 languages.☆26Mar 4, 2026Updated 2 months ago
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆67Oct 25, 2024Updated last year
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages☆106Apr 14, 2026Updated last month
- This repository contains an extension of fairseq for pixel / visual representations of text for machine translation.☆37Feb 2, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- AVocaDo : Strategy for Adapting Vocabulary to Downstream Domain☆23May 31, 2022Updated 3 years ago
- [ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- [ARCHIVED] Easily integrate rich ENS user journeys into your wallet, app, or game.☆20May 10, 2026Updated 2 weeks ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- Contains code used to conduct experiments on dependency parsing with the Tensor-LSTM model developed for our paper "Cross-Lingual Depende…☆13Jan 5, 2017Updated 9 years ago
- ☆11Mar 15, 2024Updated 2 years ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- Repository of PIXAR, a Pixel-based Auto-Regressive Language Model☆19Sep 15, 2025Updated 8 months ago
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian☆20Aug 6, 2023Updated 2 years ago
- [EMNLP 2023] 💬 Language Identification with Support for More Than 2000 Labels☆204Apr 15, 2026Updated last month
- bilingual dictionary extractor from parallel corpora☆23Jul 3, 2014Updated 11 years ago
- Terminal UI for monitoring SLURM jobs☆15Mar 29, 2026Updated last month
- Named Entity Recognition in Nepali Language☆10Jan 12, 2023Updated 3 years ago
- MAIR: A Massive Benchmark for Evaluating Instructed Retrieval. Evaluate your retrieval models on 126 diverse tasks. [EMNLP 2024]☆26Nov 3, 2024Updated last year
- Scripture Forge: Collaborative translation with suggestions and community Scripture checking; all connected to Paratext☆11Updated this week
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆97Aug 15, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- suffix array construction and searching algorithms for in-memory binary data.☆12Sep 10, 2022Updated 3 years ago
- Code and data for "Heterogeneous Supervised Topic Models"☆10Jun 27, 2022Updated 3 years ago
- Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“☆15Jun 13, 2023Updated 2 years ago
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- A Smalltalk Web Browser for Squeak/Smalltalk☆18Apr 18, 2022Updated 4 years ago
- From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks☆15Feb 23, 2023Updated 3 years ago
- The MWE identification system, MTLB-STRUCT, participated in the PARSEME 1.2 Shared Task on semi-supervised identification of verbal multi…☆14Mar 11, 2024Updated 2 years ago