Official code release for "SuperBPE: Space Travel for Language Models"
β89Jan 9, 2026Updated last month
Alternatives and similar repositories for superbpe
Users that are interested in superbpe are comparing it to the libraries listed below
Sorting:
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated 10 months ago
- Code for SaGe subword tokenizer (EACL 2023)β27Nov 30, 2024Updated last year
- Welcome to our repository! This repository hosts the data on "IndoCollex: A Testbed for Morphological Transformation of Indonesian Word β¦β23Aug 10, 2021Updated 4 years ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiaβ¦β29Jul 24, 2025Updated 7 months ago
- β44Feb 11, 2026Updated 2 weeks ago
- Code for the paper "Query-Key Normalization for Transformers"β52Mar 6, 2021Updated 4 years ago
- Anh - LAION's multilingual assistant datasets and modelsβ27Apr 5, 2023Updated 2 years ago
- Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"β26Jun 3, 2025Updated 8 months ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based β¦β11Mar 18, 2023Updated 2 years ago
- FlexiTokensβ18Dec 27, 2025Updated 2 months ago
- Label shift estimation for transfer difficulty with Familiarity.β10Feb 4, 2025Updated last year
- β107Jun 2, 2025Updated 8 months ago
- β12Dec 13, 2022Updated 3 years ago
- Implementation of Cascaded Head-colliding Attention (ACL'2021)β11Sep 16, 2021Updated 4 years ago
- Expert Specialization MoE Solution based on CUTLASSβ27Jan 19, 2026Updated last month
- β13Jul 2, 2025Updated 7 months ago
- A community repository for benchmarking Bayesian methodsβ12May 25, 2023Updated 2 years ago
- β19Jul 31, 2025Updated 7 months ago
- IsoBN: Fine-Tuning BERT with Isotropic Batch Normalizationβ12Nov 23, 2021Updated 4 years ago
- All-in-one repository for Fine-tuning & Pretraining (Large) Language Modelsβ15Mar 8, 2023Updated 2 years ago
- β13Feb 7, 2023Updated 3 years ago
- Use the tokenizer in parallel to achieve superior accelerationβ20Mar 21, 2024Updated last year
- β16Oct 16, 2024Updated last year
- Official implementation of "Data Mixture Inference: What do BPE tokenizers reveal about their training data?"β18May 15, 2025Updated 9 months ago
- βStyle Transfer as Data Augmentation: A Case Study on Named Entity Recognitionβ (EMNLP 2022)β16Feb 2, 2023Updated 3 years ago
- Code and models for the paper titled "Better Feature Integration for Named Entity Recognition", NAACL 2021.β30Nov 5, 2021Updated 4 years ago
- Official code for the NeurIPS25 paper "RAT: Bridging RNN Efficiencyand Attention Accuracy in Language Modeling" (https://arxiv.org/abs/25β¦β23Dec 10, 2025Updated 2 months ago
- The training codes of Jasper-Token-Compression-600Mβ19Nov 19, 2025Updated 3 months ago
- β13Dec 17, 2021Updated 4 years ago
- β14Sep 10, 2021Updated 4 years ago
- {DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} Γ {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}β14Jun 18, 2023Updated 2 years ago
- Code Release for "On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies"β16Apr 13, 2021Updated 4 years ago
- β15Jul 9, 2025Updated 7 months ago
- β16May 14, 2024Updated last year
- Tutorial to pretrain & fine-tune a π€ Flax T5 model on a TPUv3-8 with GCPβ58Jul 28, 2022Updated 3 years ago
- PathPiece tokenizerβ13Nov 10, 2024Updated last year
- State-of-the-art paired encoder and decoder models (17M-1B params)β58Aug 6, 2025Updated 6 months ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuningβ98Apr 26, 2023Updated 2 years ago
- SCT: An Efficient Self-Supervised Cross-View Training For Sentence Embedding (TACL)β16Jul 27, 2024Updated last year