CPJKU/wechsel

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CPJKU/wechsel)

CPJKU / wechsel

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

☆92

Alternatives and similar repositories for wechsel

Users that are interested in wechsel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

malteos / clp-transfer
View on GitHub
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning
☆30Jan 25, 2023Updated 3 years ago
konstantinjdobler / focus
View on GitHub
[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆37Jun 7, 2025Updated last year
cisnlp / ofa
View on GitHub
[NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
☆18Nov 26, 2023Updated 2 years ago
GeorgeVern / smala
View on GitHub
Python source code for EMNLP 2021 Findings paper: "Subword Mapping and Anchoring Across Languages".
☆13Sep 17, 2021Updated 4 years ago
bminixhofer / zett
View on GitHub
Code for Zero-Shot Tokenizer Transfer
☆145Jan 14, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
alexa / ramen
View on GitHub
A software for transferring pre-trained English models to foreign languages
☆20Mar 20, 2023Updated 3 years ago
jason9693 / oslo-kogpt-finetunig
View on GitHub
kogpt를 oslo로 파인튜닝하는 예제.
☆23Aug 26, 2022Updated 3 years ago
philschmid / multilingual-serverless-qa-aws-lambda
View on GitHub
☆10Dec 17, 2020Updated 5 years ago
stefan-it / xlm-v-experiments
View on GitHub
Experiments for XLM-V Transformers Integeration
☆13Feb 8, 2023Updated 3 years ago
HKUNLP / multilingual-transfer
View on GitHub
Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“
☆15Jun 13, 2023Updated 3 years ago
dhfbk / KIND
View on GitHub
KIND: an Italian Multi-Domain Dataset for Named Entity Recognition
☆13Jun 28, 2023Updated 3 years ago
alexandra-chron / lexical_xlm_relm
View on GitHub
PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tran…
☆18Oct 18, 2022Updated 3 years ago
dadelani / sib-200
View on GitHub
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
☆26May 20, 2026Updated 2 months ago
DFKI-NLP / MobIE
View on GitHub
[Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fi…
☆12Sep 17, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
krangelie / bias-in-german-nlg
View on GitHub
Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.
☆16Sep 25, 2024Updated last year
wietsedv / xpos
View on GitHub
Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages (ACL 2022)
☆19May 17, 2022Updated 4 years ago
cisnlp / MEXA
View on GitHub
[ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
☆11Apr 6, 2025Updated last year
frankxu2004 / knnlm-why
View on GitHub
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆59Jan 12, 2023Updated 3 years ago
ahmetustun / hyperx
View on GitHub
☆21Dec 5, 2022Updated 3 years ago
microsoft / MetaXL
View on GitHub
Meta Representation Transformation for Low-resource Cross-lingual Learning
☆41May 5, 2021Updated 5 years ago
kaistAI / LangBridge
View on GitHub
[ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision
☆97Oct 30, 2024Updated last year
ClimSocAna / tecb-de
View on GitHub
German Text Embedding Clustering Benchmark
☆19Mar 15, 2024Updated 2 years ago
bigscience-workshop / evaluation
View on GitHub
Code and Data for Evaluation WG
☆42May 4, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AnesBenmerzoug / langsfer
View on GitHub
A library for language transfer methods and algorithms.
☆16Feb 6, 2026Updated 5 months ago
applicaai / pyramidions
View on GitHub
This repository contains a demonstrative implementation for pooling-based models, e.g., DeepPyramidion complementing our paper "Sparsifyi…
☆14May 15, 2022Updated 4 years ago
bltlab / mot
View on GitHub
Multilingual Open Text
☆26May 8, 2025Updated last year
cisnlp / GlotScript
View on GitHub
[LREC 2024] 🖋 Resource and Tool for Writing System Identification
☆22Mar 29, 2026Updated 3 months ago
bigcode-project / bigcode-tokenizer
View on GitHub
☆15Oct 24, 2023Updated 2 years ago
Betswish / Cross-Lingual-Consistency
View on GitHub
Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper he…
☆28Aug 8, 2025Updated 11 months ago
uclanlp / synpg
View on GitHub
Code for our EACL-2021 paper "Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs".
☆38Jun 24, 2024Updated 2 years ago
leonweber / pedl
View on GitHub
Search the biomedical literature for protein interactions and protein associations
☆11Nov 24, 2023Updated 2 years ago
huggingface / olm-training
View on GitHub
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆98Feb 9, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tlkh / t2t-tuner
View on GitHub
Convenient Text-to-Text Training for Transformers
☆18Dec 10, 2021Updated 4 years ago
cisnlp / mPLM-Sim
View on GitHub
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
☆11Jan 19, 2024Updated 2 years ago
wietsedv / gpt2-recycle
View on GitHub
As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)
☆48Aug 2, 2021Updated 4 years ago
openredact / nerwhal
View on GitHub
This is a prototype of a multi-lingual suite for named-entity recognition in Python. ➡️ The project has moved to: https://gitlab.opencode…
☆21Mar 20, 2026Updated 4 months ago
Yinghao-Li / CHMM-ALT
View on GitHub
Code for "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition"
☆32Jun 20, 2023Updated 3 years ago
huggingface / datasets-tagging
View on GitHub
A Streamlit app to add structured tags to a dataset card
☆23Jun 30, 2022Updated 4 years ago
MiniXC / phones
View on GitHub
A collection of utilities for handling IPA phones.
☆27Sep 24, 2023Updated 2 years ago