stephantul/skeletoken

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stephantul/skeletoken)

stephantul / skeletoken

Datamodels for hugging face tokenizers

☆109

Alternatives and similar repositories for skeletoken

Users that are interested in skeletoken are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

stephantul / pynife
View on GitHub
Nearly Inference Free Embeddings: make your RAG queries 500x faster
☆80Apr 27, 2026Updated 3 months ago
MeLeLBGU / SaGe
View on GitHub
Code for SaGe subword tokenizer (EACL 2023)
☆28Nov 30, 2024Updated last year
flairNLP / familiarity
View on GitHub
Label shift estimation for transfer difficulty with Familiarity.
☆10Feb 4, 2025Updated last year
MinishLab / tokenlearn
View on GitHub
Pre-train Static Word Embeddings
☆109Jun 9, 2026Updated last month
Pringled / agentcheck
View on GitHub
Check what an AI agent can access before you run it
☆27Mar 8, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ejaasaari / lemur
View on GitHub
[ICML'26] LEMUR reduces multi-vector retrieval for late interaction models such as ColBERT into regular single-vector retrieval.
☆31Jun 21, 2026Updated last month
chandar-lab / NeoBERT
View on GitHub
☆109Jun 2, 2025Updated last year
sanderland / script_tok
View on GitHub
Code for the paper "BPE stays on SCRIPT", "Which Pieces Does Unigram Tokenization Really Need?" and MinGram
☆18Updated this week
neuml / staticvectors
View on GitHub
🔢 Work with static vector models
☆39Apr 21, 2025Updated last year
owos / flexitokens
View on GitHub
FlexiTokens
☆23Dec 27, 2025Updated 7 months ago
AnswerDotAI / fastkmeans
View on GitHub
☆103Jul 4, 2025Updated last year
Knowledgator / GLiClass
View on GitHub
Generalist and Lightweight Model for Text Classification
☆235Jul 21, 2026Updated last week
insait-institute / ritranslation
View on GitHub
[ACL'26 Findings] Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets
☆20Jun 27, 2026Updated last month
alea-institute / nupunkt
View on GitHub
Next-generation Punkt sentence boundary detection with zero dependencies
☆32Nov 18, 2025Updated 8 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
huggingface / ember
View on GitHub
ANE accelerated embedding models!
☆20Dec 11, 2024Updated last year
lightonai / pylate
View on GitHub
Late Interaction Models Training & Retrieval
☆876Updated this week
PythonNut / superbpe
View on GitHub
Official code release for "SuperBPE: Space Travel for Language Models"
☆97May 28, 2026Updated 2 months ago
MinishLab / model2vec-rs
View on GitHub
Official Rust Implementation of Model2Vec
☆203May 24, 2026Updated 2 months ago
davanstrien / haiku-dpo
View on GitHub
Using open source LLMs to build synthetic datasets for direct preference optimization
☆72Feb 29, 2024Updated 2 years ago
raphaelsty / LeNLP
View on GitHub
NLP with Rust for Python 🦀🐍
☆72Jun 9, 2026Updated last month
davidberenstein1957 / fast-sentence-transformers
View on GitHub
Simply, faster, sentence-transformers
☆144Aug 27, 2024Updated last year
Pringled / pyversity
View on GitHub
Fast Diversification for Search & Retrieval
☆493May 24, 2026Updated 2 months ago
lightonai / pylate-rs
View on GitHub
PyLate efficient inference engine
☆87Jan 7, 2026Updated 6 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆954May 24, 2026Updated 2 months ago
stefan-it / modern-bert-ner
View on GitHub
My NER Experiments with ModernBERT and Ettin
☆29Jul 17, 2025Updated last year
Knowledgator / GLinker
View on GitHub
Efficient and scalable zero-shot entity linking
☆140Jul 20, 2026Updated last week
instructkr / bb25
View on GitHub
bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.
☆148Mar 17, 2026Updated 4 months ago
marimo-team / marimo-operator
View on GitHub
k8s operator and plugin for marimo deployment
☆24Updated this week
mixedbread-ai / mxbai-rerank
View on GitHub
Crispy reranking models by Mixedbread
☆52Sep 17, 2025Updated 10 months ago
jina-ai / jzip-compressor
View on GitHub
Compression for unit-norm embedding vectors using spherical coordinates
☆83Jan 23, 2026Updated 6 months ago
JHU-CLSP / mmBERT
View on GitHub
A massively multilingual modern encoder language model
☆145Jan 20, 2026Updated 6 months ago
chainyo / tensorshare
View on GitHub
🤝 Trade any tensors over the network
☆31Sep 27, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
malteos / clp-transfer
View on GitHub
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning
☆30Jan 25, 2023Updated 3 years ago
argilla-io / awesome-llm-datasets
View on GitHub
👩🤝🤖 A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)
☆26May 2, 2023Updated 3 years ago
cisnlp / GlotWeb
View on GitHub
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages
☆17Apr 14, 2026Updated 3 months ago
flairNLP / transformer-ranker
View on GitHub
Efficiently find the best-suited language model (LM) for your NLP task
☆134Jul 26, 2025Updated last year
vered1986 / panic
View on GitHub
PANiC - PAraphrasing Noun-Compounds
☆15Apr 6, 2018Updated 8 years ago
MinishLab / model2vec
View on GitHub
Fast State-of-the-Art Static Embeddings
☆2,166Jun 6, 2026Updated last month
BioMikeUkr / nlp-puzzles
View on GitHub
A hands-on NLP/ML curriculum for trainees and juniors who already know basic Python, ML and want to get production-ready in AI/ML enginee…
☆73Mar 3, 2026Updated 4 months ago