[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆36Jun 7, 2025Updated 10 months ago
Alternatives and similar repositories for focus
Users that are interested in focus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆90Sep 12, 2024Updated last year
- A library for language transfer methods and algorithms.☆16Feb 6, 2026Updated 2 months ago
- ☆16Jun 14, 2024Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Jan 25, 2023Updated 3 years ago
- Code for Zero-Shot Tokenizer Transfer☆144Jan 14, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Experiments for XLM-V Transformers Integeration☆13Feb 8, 2023Updated 3 years ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆95Oct 30, 2024Updated last year
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects☆23Jan 26, 2025Updated last year
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- [Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fi…☆12Sep 17, 2024Updated last year
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆66Oct 25, 2024Updated last year
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages☆106Apr 14, 2026Updated 3 weeks ago
- [ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Easily integrate rich ENS user journeys into your wallet, app, or game.☆20Updated this week
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- Contains code used to conduct experiments on dependency parsing with the Tensor-LSTM model developed for our paper "Cross-Lingual Depende…☆13Jan 5, 2017Updated 9 years ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian☆20Aug 6, 2023Updated 2 years ago
- Seed Machine Translation Data☆34Nov 12, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Named Entity Recognition in Nepali Language☆10Jan 12, 2023Updated 3 years ago
- MAIR: A Massive Benchmark for Evaluating Instructed Retrieval. Evaluate your retrieval models on 126 diverse tasks. [EMNLP 2024]☆25Nov 3, 2024Updated last year
- A plugin for Obsidian to sync selected files with Dynalist☆17Dec 9, 2020Updated 5 years ago
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- Ukranian NER annotation project☆93Apr 23, 2025Updated last year
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆97Aug 15, 2023Updated 2 years ago
- Code and data for "Heterogeneous Supervised Topic Models"☆10Jun 27, 2022Updated 3 years ago
- Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“☆15Jun 13, 2023Updated 2 years ago
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A Smalltalk Web Browser for Squeak/Smalltalk☆17Apr 18, 2022Updated 4 years ago
- From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks☆15Feb 23, 2023Updated 3 years ago
- The MWE identification system, MTLB-STRUCT, participated in the PARSEME 1.2 Shared Task on semi-supervised identification of verbal multi…☆14Mar 11, 2024Updated 2 years ago
- Research code for pixel-based encoders of language (PIXEL)☆345Jul 15, 2025Updated 9 months ago
- ☆10Dec 21, 2024Updated last year
- This repository contains the code for paper Prompting ELECTRA Few-Shot Learning with Discriminative Pre-Trained Models.☆48Jun 7, 2022Updated 3 years ago
- 🕸️ A graph-augmented dense statute retriever. (EACL 2023)☆25Sep 26, 2023Updated 2 years ago