Datamodels for hugging face tokenizers
☆99Mar 2, 2026Updated this week
Alternatives and similar repositories for skeletoken
Users that are interested in skeletoken are comparing it to the libraries listed below
Sorting:
- ☆44Feb 11, 2026Updated 3 weeks ago
- Code for SaGe subword tokenizer (EACL 2023)☆27Nov 30, 2024Updated last year
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- decontamination☆26Dec 3, 2025Updated 3 months ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- 🤝 Trade any tensors over the network☆31Sep 27, 2023Updated 2 years ago
- 🔢 Work with static vector models☆37Apr 21, 2025Updated 10 months ago
- A tiny BERT for low-resource monolingual models☆31Dec 24, 2025Updated 2 months ago
- ANE accelerated embedding models!☆20Dec 11, 2024Updated last year
- ☆20Oct 5, 2025Updated 5 months ago
- ☆12Dec 6, 2024Updated last year
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 7 years ago
- Smaller and faster nanochat in MLX☆37Nov 15, 2025Updated 3 months ago
- Python client SDK for Ultravox.☆16Dec 10, 2025Updated 2 months ago
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper☆14Aug 9, 2021Updated 4 years ago
- Legalpioneer dataset☆15Apr 10, 2025Updated 10 months ago
- Hugging Face Jobs☆19Jul 11, 2025Updated 7 months ago
- Pre-train Static Word Embeddings☆93Sep 9, 2025Updated 5 months ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER☆21Jul 19, 2023Updated 2 years ago
- MSPaint for marimo and other Python notebooks☆24Oct 24, 2025Updated 4 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆72Feb 29, 2024Updated 2 years ago
- ☆82Nov 21, 2025Updated 3 months ago
- ☆107Jun 2, 2025Updated 9 months ago
- Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME)☆22Apr 11, 2020Updated 5 years ago
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.☆64Jul 6, 2025Updated 8 months ago
- A Python library aimed at dissecting and augmenting NER training data.☆61May 11, 2023Updated 2 years ago
- Nearly Inference Free Embeddings: make your RAG queries 500x faster☆70Feb 20, 2026Updated 2 weeks ago
- CERberus -- guardian against character errors☆29Feb 15, 2024Updated 2 years ago
- ☆32Dec 2, 2024Updated last year
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆81Feb 10, 2026Updated 3 weeks ago
- 🚀🤗 A collection of templates for Hugging Face Spaces☆35Oct 9, 2023Updated 2 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)☆32May 25, 2024Updated last year
- mkdocs plugin for reactive and interactive docs with marimo☆99Updated this week
- Efficiently find the best-suited language model (LM) for your NLP task☆135Jul 26, 2025Updated 7 months ago
- Command Line Interface for Hugging Face Inference Endpoints☆65Apr 10, 2024Updated last year
- Dashboard v5 Coming Soon!!☆63Feb 15, 2026Updated 2 weeks ago
- This repository contains the code for the Form-Context Model and its Attentive Mimicking variant.☆31May 11, 2020Updated 5 years ago