Trainable embedding transformation for confidence estimation, feature extraction, explainability and conversion from dense to sparse.
☆26Jun 9, 2025Updated 9 months ago
Alternatives and similar repositories for block-embeddings
Users that are interested in block-embeddings are comparing it to the libraries listed below
Sorting:
- Contextualized per-token embeddings☆34May 11, 2025Updated 10 months ago
- *high-load* benchmarking tool☆16Mar 12, 2026Updated last week
- ☆37Nov 21, 2024Updated last year
- A framework for evaluating semantic search across custom datasets, metrics, and embedding backends.☆38May 26, 2025Updated 9 months ago
- ☆11Dec 31, 2024Updated last year
- Node starter kit for semantic-search. Uses Mighty Inference Server with Qdrant vector search.☆15May 15, 2023Updated 2 years ago
- .NET client for Qdrant vector database☆17Dec 1, 2023Updated 2 years ago
- Header-only C++/python library for fast approximate nearest neighbors☆18Feb 9, 2020Updated 6 years ago
- benchmarks for LLM tokenizers☆17Feb 27, 2026Updated 3 weeks ago
- Javascript client library for the Qdrant vector search engine☆16Jun 19, 2022Updated 3 years ago
- hnsw implemented by python☆21Nov 28, 2019Updated 6 years ago
- Benchmark scripts for comparing different tokenizers and sentence segmenters of German☆12Feb 27, 2023Updated 3 years ago
- ☆11Nov 17, 2018Updated 7 years ago
- ☆12Sep 1, 2021Updated 4 years ago
- Supervised and unsupervised self-organising maps☆12Mar 11, 2026Updated last week
- Model implementation for the contextual embeddings project☆42Jun 2, 2025Updated 9 months ago
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- ☆13Jul 8, 2020Updated 5 years ago
- Tool to migrate data into Qdrant☆70Updated this week
- Code for the paper "Modelling Latent Translations for Cross-Lingual Transfer"☆17Nov 22, 2021Updated 4 years ago
- hydra-pl-wandb-sample-project is a NN experiment management code using hydra, pytorch-lightinig, and wandb.☆11Nov 22, 2021Updated 4 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆21Feb 7, 2023Updated 3 years ago
- Echo State Network☆17May 2, 2014Updated 11 years ago
- Dcup - Advanced RAG for Personal Knowledge ☕☆52Feb 20, 2026Updated last month
- General purpose benchmarking tool for turbopuffer deployments☆27Updated this week
- Demo example of consumer goods categorization☆30Nov 23, 2023Updated 2 years ago
- An English lexical database from the Big 🍎, let's go Mets baby love da Mets☆18Dec 12, 2025Updated 3 months ago
- Learning to Hash for Maximum Inner Product Search☆12Jan 21, 2022Updated 4 years ago
- Command-line (CLI) coffee journal designed for coffee enthusiasts. (https://codeberg.org/mrus/kopi)☆14Dec 15, 2025Updated 3 months ago
- Source code for GlorIA models pre-training.☆21Apr 3, 2024Updated last year
- A Chainer implementation of doc2vec☆10Nov 16, 2017Updated 8 years ago
- Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" EMNLP 2025☆17Jan 12, 2026Updated 2 months ago
- Emergent Communication Pretraining for Few-Shot Machine Translation☆13Dec 3, 2020Updated 5 years ago
- pyndl implements a Naive discriminative learning which is a learning and classification models based on the Rescorla-Wagner equations in …☆13Dec 8, 2025Updated 3 months ago
- 🦉 Your Junk-Free, Personalized Information and Podcasts.☆22Jun 13, 2025Updated 9 months ago
- UMLS in Python with MongoDB.☆18Dec 7, 2018Updated 7 years ago
- ☆18Feb 1, 2023Updated 3 years ago
- Python intefrace for evaluation on chatgpt models☆19Feb 13, 2024Updated 2 years ago
- plugin manager for OpenVoiceOS , STT/TTS/Wakewords that can be used anywhere☆13Mar 12, 2026Updated last week