A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆67Apr 24, 2024Updated last year
Alternatives and similar repositories for spacebyte
Users that are interested in spacebyte are comparing it to the libraries listed below
Sorting:
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆22Feb 14, 2024Updated 2 years ago
- ☆138Aug 19, 2024Updated last year
- ☆54Jul 16, 2025Updated 7 months ago
- JAX Scalify: end-to-end scaled arithmetics☆18Oct 30, 2024Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Streaming Vocos☆30Jun 10, 2025Updated 8 months ago
- Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."☆54Sep 25, 2025Updated 5 months ago
- RWKV-7 mini☆12Mar 29, 2025Updated 11 months ago
- 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment☆11Apr 6, 2025Updated 10 months ago
- unofficial pytorch implementation of HiFi-GAN with fast MISR.☆15Mar 21, 2023Updated 2 years ago
- ☆16Feb 6, 2024Updated 2 years ago
- MPI Code Generation through Domain-Specific Language Models☆14Nov 19, 2024Updated last year
- ☆14Aug 1, 2025Updated 7 months ago
- Code for our TSD paper "TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models"☆14Aug 19, 2022Updated 3 years ago
- Viterbi decoding in PyTorch☆41Sep 10, 2025Updated 5 months ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.☆16Sep 25, 2024Updated last year
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆55Aug 19, 2024Updated last year
- Official implementation of "GPT or BERT: why not both?"☆62Jul 28, 2025Updated 7 months ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- ☆16Dec 12, 2023Updated 2 years ago
- Minimal code to train ELMo models in recent versions of TensorFlow☆14Apr 30, 2023Updated 2 years ago
- "PyTorch in Rust"☆17Feb 13, 2024Updated 2 years ago
- Repository for contributions for Data Generation for Post-OCR correction of Cyrillic handwriting paper☆21Nov 27, 2023Updated 2 years ago
- [ICCV2025] WikiAutoGen offical page☆24Feb 6, 2026Updated 3 weeks ago
- End-To-End SpeechSynthesis system with knowledge distillation☆18Jul 16, 2022Updated 3 years ago
- Visualising Losses in Deep Neural Networks☆16Jul 17, 2024Updated last year
- ☆15Mar 22, 2023Updated 2 years ago
- A repository for research on medium sized language models.☆78May 23, 2024Updated last year
- Blog post☆17Feb 16, 2024Updated 2 years ago
- Scaling Sparse Fine-Tuning to Large Language Models☆18Jan 31, 2024Updated 2 years ago
- Statistics on multilingual datasets☆17Jul 12, 2022Updated 3 years ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Oct 21, 2024Updated last year
- Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER☆21Jul 19, 2023Updated 2 years ago
- ☆23Jan 27, 2025Updated last year
- Flax Image Models - State-of-the-art pre-trained vision backbones for Flax.☆23Jun 5, 2025Updated 8 months ago
- ☆82Jan 22, 2025Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆266Oct 3, 2025Updated 5 months ago