MinishLab/tokenlearn

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MinishLab/tokenlearn)

MinishLab / tokenlearn

Pre-train Static Word Embeddings

☆108

Alternatives and similar repositories for tokenlearn

Users that are interested in tokenlearn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

stephantul / pynife
View on GitHub
Nearly Inference Free Embeddings: make your RAG queries 500x faster
☆80Apr 27, 2026Updated 2 months ago
MinishLab / model2vec
View on GitHub
Fast State-of-the-Art Static Embeddings
☆2,159Jun 6, 2026Updated last month
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆946May 24, 2026Updated last month
hieudx149 / X-RetroMAE
View on GitHub
Code Roberta version of RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
☆10Mar 16, 2023Updated 3 years ago
neuml / staticvectors
View on GitHub
🔢 Work with static vector models
☆39Apr 21, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MinishLab / model2vec-rs
View on GitHub
Official Rust Implementation of Model2Vec
☆197May 24, 2026Updated last month
stephantul / reach
View on GitHub
Load embeddings and featurize your sentences.
☆31Oct 23, 2024Updated last year
Pringled / agentcheck
View on GitHub
Check what an AI agent can access before you run it
☆27Mar 8, 2026Updated 4 months ago
hotchpotch / yasem
View on GitHub
YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings
☆13May 22, 2025Updated last year
lightonai / pylate
View on GitHub
Late Interaction Models Training & Retrieval
☆875Jul 13, 2026Updated last week
stephantul / skeletoken
View on GitHub
Datamodels for hugging face tokenizers
☆108Jun 18, 2026Updated last month
owos / flexitokens
View on GitHub
FlexiTokens
☆23Dec 27, 2025Updated 6 months ago
Knowledgator / GLiClass
View on GitHub
Generalist and Lightweight Model for Text Classification
☆233Updated this week
MantisAI / sieves
View on GitHub
Plug-and-play document AI with zero-shot models.
☆126May 11, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
instructkr / bb25
View on GitHub
bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.
☆148Mar 17, 2026Updated 4 months ago
ielab / Starbucks
View on GitHub
Starbucks: Improved Training for 2D Matryoshka Embeddings
☆25Jun 30, 2025Updated last year
Pringled / pyversity
View on GitHub
Fast Diversification for Search & Retrieval
☆492May 24, 2026Updated last month
JHU-CLSP / ettin-encoder-vs-decoder
View on GitHub
State-of-the-art paired encoder and decoder models (17M-1B params)
☆74Aug 6, 2025Updated 11 months ago
huggingface / ai-blueprint
View on GitHub
A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …
☆66Feb 6, 2025Updated last year
machinelearningZH / hybrid-search-eval
View on GitHub
A framework for benchmarking embedding models in hybrid search scenarios (BM25 + vector search) using Weaviate.
☆40Updated this week
ibm-granite / granite-embedding-models
View on GitHub
☆77May 14, 2026Updated 2 months ago
Reaper2403 / slm-llm-grounding-playbook
View on GitHub
Architecture pattern for combining a fast LLM voice loop with a slower SLM that tracks hard facts.
☆15Apr 27, 2026Updated 2 months ago
s-smits / modernbert-finetune
View on GitHub
Fine-tune ModernBERT with custom tokenizers, curriculum learning, and next-gen optimizers.
☆74Jan 16, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
machinelearningZH / semantic-search-eval
View on GitHub
A framework for evaluating semantic search across custom datasets, metrics, and embedding backends.
☆39Jul 9, 2026Updated last week
KRLabsOrg / verbatim-rag
View on GitHub
Hallucination-prevention RAG system with verbatim span extraction. Ensures all generated content is grounded in source documents with exa…
☆202Jul 13, 2026Updated last week
othr-nlp / rage_toolkit
View on GitHub
☆11Sep 27, 2024Updated last year
qdrant / block-embeddings
View on GitHub
Trainable embedding transformation for confidence estimation, feature extraction, explainability and conversion from dense to sparse.
☆28Jun 23, 2026Updated 3 weeks ago
oceanumeric / EnteRAG
View on GitHub
A RAG that can scale 🧑🏻‍💻
☆11May 28, 2024Updated 2 years ago
Knowledgator / GLinker
View on GitHub
Efficient and scalable zero-shot entity linking
☆140May 21, 2026Updated 2 months ago
thad0ctor / KrunchWrapper
View on GitHub
☆18Jul 1, 2025Updated last year
Babelscape / WSL
View on GitHub
Word Sense Linking model is designed to identify and disambiguate spans of text to their most suitable senses from a reference inventory.
☆13Aug 23, 2024Updated last year
enjalot / latent-sae
View on GitHub
Training code for Sparse Autoencoders on Embedding models
☆39Jul 11, 2026Updated last week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
PrithivirajDamodaran / Route0x
View on GitHub
Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da
☆122Mar 31, 2025Updated last year
x-tabdeveloping / turftopic
View on GitHub
Robust and fast topic models with sentence-transformers.
☆118Updated this week
Knowledgator / LiqFit
View on GitHub
Efficient few-shot learning with cross-encoders.
☆68Feb 16, 2024Updated 2 years ago
Knowledgator / FlashDeBERTa
View on GitHub
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆90Feb 10, 2026Updated 5 months ago
cognica-io / bayesian-bm25
View on GitHub
Bayesian probability transforms for BM25 retrieval scores
☆77Jun 20, 2026Updated last month
LAGoM-NLP / transtokenizer
View on GitHub
☆57Dec 27, 2025Updated 6 months ago
knowledgeable-embedding / knowledgeable-embedding
View on GitHub
Knowledgeable Embedding: Injecting dynamically updatable entity knowledge into embeddings to enhance RAG
☆15Aug 31, 2025Updated 10 months ago