Trainable embedding transformation for confidence estimation, feature extraction, explainability and conversion from dense to sparse.
☆28Jun 9, 2025Updated last year
Alternatives and similar repositories for block-embeddings
Users that are interested in block-embeddings are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Contextualized per-token embeddings☆36Updated this week
- A framework for evaluating semantic search across custom datasets, metrics, and embedding backends.☆39May 31, 2026Updated last week
- Load embeddings and featurize your sentences.☆31Oct 23, 2024Updated last year
- ☆11Dec 31, 2024Updated last year
- Header-only C++/python library for fast approximate nearest neighbors☆18Feb 9, 2020Updated 6 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- benchmarks for LLM tokenizers☆18Mar 25, 2026Updated 2 months ago
- A universal Qdrant table frontend based on transformers.js☆20Mar 26, 2024Updated 2 years ago
- Automatically exported from code.google.com/p/transducersaurus☆11Apr 1, 2015Updated 11 years ago
- Benchmark scripts for comparing different tokenizers and sentence segmenters of German☆12Feb 27, 2023Updated 3 years ago
- Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFs☆30Jan 20, 2025Updated last year
- FlexiTokens☆23Dec 27, 2025Updated 5 months ago
- This project implements a Retrieval-Augmented Generation (RAG) system that can handle different types of files. The system uses FastAPI f…☆33May 29, 2025Updated last year
- Nanoloop source files for the album "Prime 16"☆11Mar 7, 2026Updated 3 months ago
- ☆12Sep 1, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Fast tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tok…☆53May 10, 2026Updated last month
- Supervised and unsupervised self-organising maps☆13Mar 11, 2026Updated 2 months ago
- ☆11Apr 11, 2019Updated 7 years ago
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- Model implementation for the contextual embeddings project☆47Jun 2, 2025Updated last year
- ☆13Jul 8, 2020Updated 5 years ago
- Tool to migrate data into Qdrant☆78Updated this week
- Code for the paper "Modelling Latent Translations for Cross-Lingual Transfer"☆17Nov 22, 2021Updated 4 years ago
- hydra-pl-wandb-sample-project is a NN experiment management code using hydra, pytorch-lightinig, and wandb.☆11Nov 22, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆20Feb 7, 2023Updated 3 years ago
- Echo State Network☆17May 2, 2014Updated 12 years ago
- Demo example of consumer goods categorization☆31Updated this week
- ☆10Jul 23, 2021Updated 4 years ago
- An English lexical database from the Big 🍎, let's go Mets baby love da Mets☆18May 19, 2026Updated 3 weeks ago
- Learning to Hash for Maximum Inner Product Search☆12Jan 21, 2022Updated 4 years ago
- A pronunciation trainer w/ Python.☆15Sep 28, 2025Updated 8 months ago
- Command-line (CLI) coffee journal designed for coffee enthusiasts. (https://codeberg.org/mrus/kopi)☆14Dec 15, 2025Updated 5 months ago
- Deep CCA (DCCA) network☆15Apr 12, 2016Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This repository contains a demonstrative implementation for pooling-based models, e.g., DeepPyramidion complementing our paper "Sparsifyi…☆14May 15, 2022Updated 4 years ago
- A Chainer implementation of doc2vec☆10Nov 16, 2017Updated 8 years ago
- Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" EMNLP 2025☆17Jan 12, 2026Updated 4 months ago
- Source code for GlorIA models pre-training.☆23Apr 3, 2024Updated 2 years ago
- Emergent Communication Pretraining for Few-Shot Machine Translation☆13Dec 3, 2020Updated 5 years ago
- ☆19Apr 27, 2023Updated 3 years ago
- A magic notepad. δ☆14May 21, 2023Updated 3 years ago