LAGoM-NLP/transtokenizer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LAGoM-NLP/transtokenizer)

LAGoM-NLP / transtokenizer

☆57

Alternatives and similar repositories for transtokenizer

Users that are interested in transtokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

boschresearch / adversarial_meta_embeddings
View on GitHub
Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"
☆13Dec 14, 2021Updated 4 years ago
bminixhofer / zett
View on GitHub
Code for Zero-Shot Tokenizer Transfer
☆145Jan 14, 2025Updated last year
Tinycompany-AI / tokenadapt
View on GitHub
0-Shot Tokenizer Transplant
☆14May 16, 2025Updated last year
sufenlp / AccAlign
View on GitHub
A accurate multilingual word aligner based on LaBSE
☆24Oct 25, 2023Updated 2 years ago
lukasgarbas / can-we-tune-together
View on GitHub
Combining encoder-based language models
☆11Nov 11, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
DiLi-Lab / ScanDL
View on GitHub
☆14Apr 29, 2025Updated last year
philschmid / multilingual-serverless-qa-aws-lambda
View on GitHub
☆10Dec 17, 2020Updated 5 years ago
yahshibu / nested-ner-tacl2020-flair
View on GitHub
Implementation of Nested Named Entity Recognition using Flair
☆24Oct 29, 2021Updated 4 years ago
mcognetta / LotteryTickets.jl
View on GitHub
Sparsify Your Flux Models
☆14Sep 20, 2023Updated 2 years ago
owos / flexitokens
View on GitHub
FlexiTokens
☆23Dec 27, 2025Updated 7 months ago
teekuningas / sparsecca
View on GitHub
Python implementations for Sparse CCA
☆21Feb 24, 2023Updated 3 years ago
jqueguiner / wav2vec2-sprint
View on GitHub
docker for HF wav2vec2-sprint
☆13Mar 26, 2021Updated 5 years ago
NiuTrans / ForgettingCurve
View on GitHub
A benchmark for testing memorization abilities of LMs
☆24Oct 15, 2024Updated last year
adapter-hub / hgiyt
View on GitHub
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"
☆28Oct 3, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
flairNLP / transformer-ranker
View on GitHub
Efficiently find the best-suited language model (LM) for your NLP task
☆134Jul 26, 2025Updated last year
bminixhofer / tokenkit
View on GitHub
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
☆69Jul 6, 2025Updated last year
taidopurason / tokenizer-extension
View on GitHub
☆15Dec 4, 2025Updated 7 months ago
cisnlp / ofa
View on GitHub
[NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
☆18Nov 26, 2023Updated 2 years ago
randombk / llm2sh
View on GitHub
Ask GPT to run a command
☆196May 21, 2026Updated 2 months ago
sfeucht / footprints
View on GitHub
https://footprints.baulab.info
☆17Oct 4, 2024Updated last year
lightblue-tech / lb-reranker
View on GitHub
☆24Jan 30, 2025Updated last year
sanderland / script_tok
View on GitHub
Code for the paper "BPE stays on SCRIPT", "Which Pieces Does Unigram Tokenization Really Need?" and MinGram
☆18Jun 26, 2026Updated last month
EyeBench / eyebench
View on GitHub
EyeBench: Predictive Modeling from Eye Movements in Reading
☆17Apr 6, 2026Updated 3 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
cisnlp / multypo
View on GitHub
A Multilingual Keyboard Layout-Based Typo Generator
☆17Nov 23, 2025Updated 8 months ago
orionw / promptriever
View on GitHub
The first dense retrieval model that can be prompted like an LM
☆93May 8, 2025Updated last year
Knowledgator / FlashDeBERTa
View on GitHub
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆90Feb 10, 2026Updated 5 months ago
clinicalml / co-llm
View on GitHub
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆128May 7, 2024Updated 2 years ago
MinishLab / tokenlearn
View on GitHub
Pre-train Static Word Embeddings
☆109Jun 9, 2026Updated last month
thu-coai / Stylized-Story-Generation-with-Style-Guided-Planning
View on GitHub
Codes for paper "Stylized Story Generation with Style-Guided Planning"
☆12May 9, 2021Updated 5 years ago
wietsedv / xpos
View on GitHub
Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages (ACL 2022)
☆19May 17, 2022Updated 4 years ago
rycolab / kl-rb
View on GitHub
This repository contains code for the paper "Better Estimation of the KL Divergence Between Language Models"
☆19May 30, 2025Updated last year
kinoshitadaisuke / ncu_astroinformatics_202209
View on GitHub
The repository for the course "Astroinformatics" offered at Institute of Astronomy, National Central University, from Sep/2022 to Jan/202…
☆10Jun 4, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Evocargo / Lidar-Annotation-is-All-You-Need
View on GitHub
2D road segmentation using lidar data during training
☆43Dec 21, 2023Updated 2 years ago
facebookresearch / mexma
View on GitHub
MEXMA: Token-level objectives improve sentence representations
☆43Jan 6, 2025Updated last year
jouniluoma / bert-ner-cmv
View on GitHub
☆13Dec 17, 2021Updated 4 years ago
uclaml / COPS
View on GitHub
The official implementation of Cross-Task Experience Sharing (COPS)
☆29Oct 23, 2024Updated last year
huggingface / datasets-tagging
View on GitHub
A Streamlit app to add structured tags to a dataset card
☆23Jun 30, 2022Updated 4 years ago
pchizhov / picky_bpe
View on GitHub
BPE modification that implements removing of the intermediate tokens during tokenizer training.
☆27Nov 25, 2024Updated last year
haoyi-duan / DG-SCT
View on GitHub
NeurIPS'2023 official implementation code
☆70Nov 11, 2023Updated 2 years ago