cisnlp/ofa

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cisnlp/ofa)

cisnlp / ofa

[NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining

☆18

Alternatives and similar repositories for ofa

Users that are interested in ofa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tatHi / maxmatch_dropout
View on GitHub
☆10Sep 13, 2022Updated 3 years ago
cisnlp / GlotCC
View on GitHub
[NeurIPS 2024] 🕸 GlotCC Dataset and Pipline
☆21Apr 6, 2025Updated last year
cisnlp / multypo
View on GitHub
A Multilingual Keyboard Layout-Based Typo Generator
☆17Nov 23, 2025Updated 8 months ago
human-rights-corpus / HRC
View on GitHub
#인권코퍼스
☆31Oct 6, 2023Updated 2 years ago
cisnlp / MEXA
View on GitHub
[ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
☆11Apr 6, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cisnlp / GlotWeb
View on GitHub
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages
☆17Apr 14, 2026Updated 3 months ago
sangHa0411 / Llama-Instruction-Tuning
View on GitHub
☆10Dec 28, 2023Updated 2 years ago
CPJKU / wechsel
View on GitHub
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
☆92Sep 12, 2024Updated last year
Data-Intelligence-Lab / DEFT-korean-alpaca
View on GitHub
☆23Oct 30, 2023Updated 2 years ago
BM-K / KoDiffCSE
View on GitHub
Difference-based Contrastive Learning for Korean Sentence Embeddings
☆23Mar 11, 2026Updated 4 months ago
cisnlp / mPLM-Sim
View on GitHub
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
☆11Jan 19, 2024Updated 2 years ago
philschmid / multilingual-serverless-qa-aws-lambda
View on GitHub
☆10Dec 17, 2020Updated 5 years ago
cisnlp / GlotScript
View on GitHub
[LREC 2024] 🖋 Resource and Tool for Writing System Identification
☆22Mar 29, 2026Updated 3 months ago
gowitheflow-1998 / Pixel-Linguist
View on GitHub
☆15Mar 8, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
konstantinjdobler / focus
View on GitHub
[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆37Jun 7, 2025Updated last year
mainlp / Multilingual-Refusal
View on GitHub
☆16Nov 5, 2025Updated 8 months ago
bminixhofer / zett
View on GitHub
Code for Zero-Shot Tokenizer Transfer
☆145Jan 14, 2025Updated last year
sangHa0411 / fashion-how
View on GitHub
☆17Dec 16, 2022Updated 3 years ago
MrBananaHuman / PangyoCorpora
View on GitHub
☆36Oct 4, 2023Updated 2 years ago
HeegyuKim / korouge
View on GitHub
Google 공식 Rouge Implementation을 한국어에서 사용할 수 있도록 처리
☆17Jan 3, 2024Updated 2 years ago
dadelani / sib-200
View on GitHub
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
☆26May 20, 2026Updated 2 months ago
alexandra-chron / lexical_xlm_relm
View on GitHub
PyTorch source code of NAACL 2021 paper "Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Tran…
☆18Oct 18, 2022Updated 3 years ago
cindyxinyiwang / multiview-subword-regularization
View on GitHub
PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"
☆26Jun 2, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
monologg / korean-hate-speech-koelectra
View on GitHub
Bias, Hate classification with KoELECTRA 👿
☆27Jun 12, 2023Updated 3 years ago
LostCow / KLUE
View on GitHub
KLUE Benchmark 1st place (2021.12) solutions. (RE, MRC, NLI, STS, TC)
☆25Apr 11, 2022Updated 4 years ago
kaistAI / LangBridge
View on GitHub
[ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision
☆97Oct 30, 2024Updated last year
kensho-technologies / pathpiece
View on GitHub
PathPiece tokenizer
☆14Nov 10, 2024Updated last year
snoop2head / Tokenized-Lip-Reading
View on GitHub
👄 Transformer Model for Lip Reading in the Wild (LRW) Benchmark
☆12Mar 18, 2023Updated 3 years ago
antonisa / unimorph_inflect
View on GitHub
A python library for easily querying morphological inflection models trained on Unimorph
☆13Oct 23, 2022Updated 3 years ago
affjljoo3581 / polyglot-jax-inference
View on GitHub
TPU에서 한국어용 LLM 추론을 위한 Jax/Flax 구현체입니다.
☆12Jun 12, 2023Updated 3 years ago
Betswish / Cross-Lingual-Consistency
View on GitHub
Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper he…
☆28Aug 8, 2025Updated 11 months ago
iPieter / llmq
View on GitHub
A Scheduler for Batched LLM Inference
☆19Oct 5, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
gentaiscool / miners
View on GitHub
MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)
☆14Oct 3, 2024Updated last year
boostcampaitech2 / final-project-level3-nlp-02
View on GitHub
final-project-level3-nlp-02 created by GitHub Classroom
☆11Dec 31, 2021Updated 4 years ago
pdufter / densray
View on GitHub
Getting interpretable dimensions in word embedding spaces.
☆15Jul 6, 2023Updated 3 years ago
cisnlp / Glot500
View on GitHub
[ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
☆107Apr 14, 2026Updated 3 months ago
AI21Labs / pmi-masking
View on GitHub
This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper
☆14Aug 9, 2021Updated 4 years ago
yuweihao / LV-BERT
View on GitHub
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)
☆18May 10, 2023Updated 3 years ago
sionic-ai / Data_KoSuperNI
View on GitHub
StrategyQA 데이터 세트 번역
☆22Apr 12, 2024Updated 2 years ago