orevaahia/magnet-tokenization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/orevaahia/magnet-tokenization)

orevaahia / magnet-tokenization

☆11

Alternatives and similar repositories for magnet-tokenization

Users that are interested in magnet-tokenization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

owos / flexitokens
View on GitHub
FlexiTokens
☆23Dec 27, 2025Updated 7 months ago
swiss-ai / parity-aware-bpe
View on GitHub
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [ACL 2026]
☆20Apr 18, 2026Updated 3 months ago
SALT-NLP / multi-value
View on GitHub
Complete set of English dialect transformation rules and evaluation code
☆16Jun 7, 2024Updated 2 years ago
LuisaMaerz / KnowMAN
View on GitHub
KnowMAN: Weakly Supervised Multinomial Adversarial Networks
☆12Nov 9, 2021Updated 4 years ago
KoelLabs / ML
View on GitHub
Koel Labs innovates open-source speech research, inclusive speech technologies, and real-time pronunciation feedback for language learner…
☆25Jul 13, 2026Updated 2 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ltgoslo / factorizer
View on GitHub
☆16May 14, 2024Updated 2 years ago
DFKI-NLP / MobIE
View on GitHub
[Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fi…
☆12Sep 17, 2024Updated last year
cisnlp / bias-in-nlp
View on GitHub
Literature overview: gender bias in natural language processing
☆12Jan 26, 2021Updated 5 years ago
PiotrNawrot / dynamic-pooling
View on GitHub
Efficient Transformers with Dynamic Token Pooling
☆68May 20, 2023Updated 3 years ago
kdu4108 / context-vs-prior-finetuning
View on GitHub
☆15May 27, 2025Updated last year
schwartz-lab-NLP / Tokens2Words
View on GitHub
☆16Apr 2, 2025Updated last year
teffland / ner-expected-entity-ratio
View on GitHub
Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022
☆14Nov 7, 2022Updated 3 years ago
valentinhofmann / politosphere
View on GitHub
☆19Jun 7, 2022Updated 4 years ago
smallbenchnlp / ELECTRA-DeBERTa
View on GitHub
☆16Dec 14, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
paul-rottger / issuebench
View on GitHub
Röttger et al. (2024): "IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance"
☆17Mar 6, 2026Updated 4 months ago
PythonNut / superbpe
View on GitHub
Official code release for "SuperBPE: Space Travel for Language Models"
☆97May 28, 2026Updated 2 months ago
yangyu12 / lagm
View on GitHub
Learning to Annotate Part Segmentation with Gradient Matching (ICLR 2022)
☆12Apr 26, 2022Updated 4 years ago
cindyxinyiwang / expand-via-lexicon-based-adaptation
View on GitHub
Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
☆29Apr 2, 2022Updated 4 years ago
david-gimeno / tailored-avsr
View on GitHub
Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"
☆15Feb 24, 2025Updated last year
liyaguang / st-gcn
View on GitHub
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch
☆14Jan 4, 2019Updated 7 years ago
cosmaadrian / psymo
View on GitHub
Repository for the WACV 2024 paper "PsyMo: A Dataset for Estimating Self-Reported Psychological Traits from Gait"
☆14Feb 22, 2024Updated 2 years ago
turkish-nlp-suite / Turkish-Wiki-NER-Dataset
View on GitHub
Repo for Turkish Wiki NER dataset.
☆13Jul 11, 2023Updated 3 years ago
allenai / EmbeddingRecycling
View on GitHub
Embedding Recycling for Language models
☆38Jul 11, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Betswish / MIRAGE
View on GitHub
Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/
☆25Mar 10, 2025Updated last year
jjzha / cartography-al
View on GitHub
Code base for the EMNLP 2021 Findings paper: Cartography Active Learning
☆14Jun 3, 2025Updated last year
AI21Labs / pmi-masking
View on GitHub
This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper
☆14Aug 9, 2021Updated 4 years ago
yuweihao / LV-BERT
View on GitHub
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)
☆18May 10, 2023Updated 3 years ago
contrebande-labs / charred
View on GitHub
CHARacter-awaRE Diffusion: Multilingual Character-Aware Encoders for Font-Aware Diffusers That Can Actually Spell
☆14May 28, 2023Updated 3 years ago
sutdcv / Chaotic-World
View on GitHub
[ICCV2023] Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events
☆10Dec 7, 2024Updated last year
skai-research / ScholarEval
View on GitHub
Official code and data for the paper "ScholarEval: Research Idea Evaluation Grounded in Literature."
☆20Oct 28, 2025Updated 9 months ago
allenai / fluid-benchmarking
View on GitHub
Fluid Language Model Benchmarking
☆29Sep 16, 2025Updated 10 months ago
helboukkouri / character-bert-pretraining
View on GitHub
Code for pre-training CharacterBERT models (as well as BERT models).
☆34Sep 6, 2021Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
twinkle0331 / Xcompression
View on GitHub
[ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)
☆22May 24, 2023Updated 3 years ago
faker2048 / youtube-faster-whisper
View on GitHub
YTWS is a simple CLI tool that downloads YouTube videos and creates subtitles quickly. It uses yt-dlp for downloading and faster-whisper …
☆47Oct 18, 2025Updated 9 months ago
huawei-lin / RapidIn
View on GitHub
RapidIn: Scalable Influence Estimation for Large Language Models (LLMs). The implementation for paper "Token-wise Influential Training Da…
☆22Mar 10, 2026Updated 4 months ago
tonybaloney / spew
View on GitHub
A tool for generating random, syntactically-correct Python code. Designed for fuzzing and testing of tools that parse Python code.
☆23Sep 22, 2023Updated 2 years ago
allenai / noncompliance
View on GitHub
This repository contains data, code and models for contextual noncompliance.
☆26Jul 18, 2024Updated 2 years ago
frankaging / Causal-Distill
View on GitHub
The Codebase for Causal Distillation for Language Models (NAACL '22)
☆26May 1, 2022Updated 4 years ago
cisnlp / ofa
View on GitHub
[NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining
☆18Nov 26, 2023Updated 2 years ago