[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆36Jun 7, 2025Updated 9 months ago
Alternatives and similar repositories for focus
Users that are interested in focus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆90Sep 12, 2024Updated last year
- ☆16Jun 14, 2024Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Jan 25, 2023Updated 3 years ago
- Code for Zero-Shot Tokenizer Transfer☆143Jan 14, 2025Updated last year
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Experiments for XLM-V Transformers Integeration☆13Feb 8, 2023Updated 3 years ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆96Oct 30, 2024Updated last year
- SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects☆23Jan 26, 2025Updated last year
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- [Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fi…☆12Sep 17, 2024Updated last year
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆63Oct 25, 2024Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆106Apr 20, 2024Updated last year
- This repository contains an extension of fairseq for pixel / visual representations of text for machine translation.☆37Feb 2, 2024Updated 2 years ago
- AVocaDo : Strategy for Adapting Vocabulary to Downstream Domain☆23May 31, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 11 months ago
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 3 years ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- Contains code used to conduct experiments on dependency parsing with the Tensor-LSTM model developed for our paper "Cross-Lingual Depende…☆13Jan 5, 2017Updated 9 years ago
- ☆11Mar 15, 2024Updated 2 years ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- An opinionated NLP research template☆10Aug 29, 2024Updated last year
- Repository of PIXAR, a Pixel-based Auto-Regressive Language Model☆18Sep 15, 2025Updated 6 months ago
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- GPT-2 Metadata Pretraining Towards Instruction Finetuning for Ukrainian☆20Aug 6, 2023Updated 2 years ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆191Mar 12, 2026Updated 2 weeks ago
- Seed Machine Translation Data☆33Nov 12, 2024Updated last year
- Terminal UI for monitoring SLURM jobs☆14Mar 20, 2026Updated last week
- Named Entity Recognition in Nepali Language☆10Jan 12, 2023Updated 3 years ago
- MAIR: A Massive Benchmark for Evaluating Instructed Retrieval. Evaluate your retrieval models on 126 diverse tasks. [EMNLP 2024]☆24Nov 3, 2024Updated last year
- Scripture Forge: Collaborative translation with suggestions and community Scripture checking; all connected to Paratext☆11Updated this week
- Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".☆13Sep 17, 2021Updated 4 years ago
- Ukranian NER annotation project☆92Apr 23, 2025Updated 11 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆97Aug 15, 2023Updated 2 years ago
- Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“☆15Jun 13, 2023Updated 2 years ago
- Lowering PyTorch's Memory Consumption for Selective Differentiation☆12Aug 29, 2024Updated last year
- A Smalltalk Web Browser for Squeak/Smalltalk☆17Apr 18, 2022Updated 3 years ago
- The MWE identification system, MTLB-STRUCT, participated in the PARSEME 1.2 Shared Task on semi-supervised identification of verbal multi…☆14Mar 11, 2024Updated 2 years ago
- Toolkit for a learning health system☆27Jan 12, 2026Updated 2 months ago
- Python package for compressing floating-point PyTorch tensors☆13Jul 22, 2024Updated last year