google/unisim

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google/unisim)

google / unisim

UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.

☆149

Alternatives and similar repositories for unisim

Users that are interested in unisim are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

webis-de / scidata22-stereo-scientific-text-reuse
View on GitHub
☆11Dec 2, 2024Updated last year
huggingface / candle-cublaslt
View on GitHub
☆13Feb 22, 2024Updated 2 years ago
OptimalFoundation / nadir
View on GitHub
Nadir: Cutting-edge PyTorch optimizers for simplicity & composability! 🔥🚀💻
☆14Jun 15, 2024Updated 2 years ago
savasy / TC32
View on GitHub
Text Classification Dataset for Turkish Language
☆10Nov 16, 2021Updated 4 years ago
unum-cloud / USearch
View on GitHub
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, S…
☆4,226Jul 10, 2026Updated last week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
KarineAyrs / knowledge-distillation-semantic-search
View on GitHub
KDSS is the framework for knowledge distillation from LLMs
☆12Nov 5, 2025Updated 8 months ago
dgg32 / age_vector
View on GitHub
☆14Sep 18, 2024Updated last year
boschresearch / adversarial_meta_embeddings
View on GitHub
Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"
☆13Dec 14, 2021Updated 4 years ago
google-research-datasets / swim-ir
View on GitHub
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…
☆50Nov 13, 2023Updated 2 years ago
gojiplus / statqa
View on GitHub
Extract Stats Q/A from Tables With Provenance
☆26Dec 27, 2025Updated 6 months ago
iliaschalkidis / flash-roberta
View on GitHub
Hugging Face RoBERTa with Flash Attention 2
☆24Sep 14, 2025Updated 10 months ago
lightonai / ducksearch
View on GitHub
Efficient BM25 with DuckDB 🦆
☆68Dec 20, 2024Updated last year
re-search / gpt2-estimator
View on GitHub
A tf.estimator version of GPT2
☆27Jan 29, 2022Updated 4 years ago
G-Research / fast-string-search
View on GitHub
☆13Apr 13, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Linzwcs / AFT
View on GitHub
☆13Jan 22, 2025Updated last year
ashvardanian / SmashTable
View on GitHub
If only std::set was a DBMS: collection of templated ACID in-memory exception-free thread-safe and concurrent containers in a header-only…
☆45Oct 30, 2025Updated 8 months ago
ashvardanian / StringTape
View on GitHub
Apache Arrow-compatible space-efficient "tape" class in pure Rust to be used with StringZilla for GPU, NUMA, and disk transfers of variab…
☆31Updated this week
trapoom555 / Language-Model-STS-CFT
View on GitHub
Improving Text Embedding of Language Models Using Contrastive Fine-tuning
☆64Aug 2, 2024Updated last year
ihmeuw / pseudopeople
View on GitHub
pseudopeople is a Python package that generates realistic simulated data about a fictional United States population, designed for use in …
☆25Mar 25, 2026Updated 3 months ago
infrahq / helm-charts
View on GitHub
Infra Helm charts
☆10May 27, 2024Updated 2 years ago
axeld5 / pali_reason
View on GitHub
Testing paligemma2 finetuning on reasoning dataset
☆18Dec 28, 2024Updated last year
cleanzr / dblink
View on GitHub
Distributed Bayesian Entity Resolution in Apache Spark
☆60Jun 10, 2021Updated 5 years ago
RedisAI / aibench
View on GitHub
AIBench, a tool for comparing and evaluating AI serving solutions. forked from [tsbs](https://github.com/timescale/tsbs) and adapted to A…
☆21Sep 4, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
unum-cloud / UStore
View on GitHub
Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings…
☆636Sep 1, 2023Updated 2 years ago
huggingface / huggingface-inference-toolkit
View on GitHub
Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.
☆94May 28, 2026Updated last month
wenet-e2e / WeSpeech-AI
View on GitHub
Open Source Speech/Text Data on AI
☆19Sep 13, 2022Updated 3 years ago
michaelfeil / candle-flash-attn-v3
View on GitHub
☆15Dec 21, 2025Updated 7 months ago
r-builder / cran2deb
View on GitHub
Creating Debian Packages from CRAN Sources
☆12Jul 1, 2020Updated 6 years ago
joelparkerhenderson / social-value-orientation
View on GitHub
Social value orientation (SVO) notes for pro-social pro-self concepts
☆13Apr 14, 2025Updated last year
langtech-bsc / mt-evaluation
View on GitHub
A framework for evaluating Machine Translation models.
☆13Apr 21, 2026Updated 3 months ago
ictnlp / Seq-NAT
View on GitHub
Source code for <Sequence-Level Training for Non-Autoregressive Neural Machine Translation>.
☆24Jan 17, 2022Updated 4 years ago
catie-aq / flashT5
View on GitHub
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆116Oct 30, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ashvardanian / NumKong
View on GitHub
SIMD-accelerated distances, dot products, matrix ops, geospatial & geometric kernels for 16 numeric types — from 6-bit floats to 64-bit c…
☆1,851Updated this week
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆947May 24, 2026Updated last month
icip-cas / SelfRetrieval
View on GitHub
☆41Nov 7, 2024Updated last year
uktrade / matchbox
View on GitHub
Prototype record matching database.
☆28Updated this week
Preemo-Inc / gradientai-python-sdk
View on GitHub
Interface for interacting with Gradient AI in Python
☆15Jun 28, 2024Updated 2 years ago
OlivierBinette / er-evaluation
View on GitHub
An End-to-End Evaluation Framework for Entity Resolution Systems
☆38Dec 3, 2023Updated 2 years ago
stephantul / pynife
View on GitHub
Nearly Inference Free Embeddings: make your RAG queries 500x faster
☆80Apr 27, 2026Updated 2 months ago