google-research / retvecLinks
RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.
☆293Updated 9 months ago
Alternatives and similar repositories for retvec
Users that are interested in retvec are comparing it to the libraries listed below
Sorting:
- UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.☆145Updated 9 months ago
- ☆19Updated 2 years ago
- The Foundation Model Transparency Index☆85Updated last month
- Your buddy in the (L)LM space.☆64Updated last year
- Managing the lifecycle of machine learning to support scalability, impact, collaboration, compliance and sharing.☆91Updated this week
- Efficient vector database for hundred millions of embeddings.☆211Updated last year
- ☆339Updated 2 years ago
- The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their…☆71Updated 5 months ago
- BlindBox is a tool to isolate and deploy applications inside Trusted Execution Environments for privacy-by-design apps☆63Updated 2 years ago
- Neural Search☆367Updated 10 months ago
- ☆115Updated 11 months ago
- Statistics of Common Crawl monthly archives mined from URL index files☆208Updated this week
- Zero-trust AI APIs for easy and private consumption of open-source LLMs☆41Updated last year
- Lightweight Nearest Neighbors with Flexible Backends☆330Updated 3 weeks ago
- Source code for Mozilla.ai's Lumigator platform☆275Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆53Updated 2 years ago
- a small code base for training large models☆318Updated 8 months ago
- The world's largest social media toxicity dataset.☆189Updated 3 years ago
- GPT Takes the Bar Exam☆142Updated 3 years ago
- Full text search that feels like a numpy array☆295Updated last month
- Common crawl extractor☆84Updated last year
- The Institutional Data Initiative's pipeline for analyzing, refining, and publishing the Institutional Books 1.0 collection.☆48Updated last month
- ☆121Updated 3 years ago
- Train a model, and detect gibberish strings with it.☆68Updated 3 years ago
- Gzip and nearest neighbors for text classification☆57Updated 2 years ago
- ☆47Updated last year
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models access…☆114Updated last year
- Creating the tools and data sets necessary to evaluate vulnerabilities in LLMs.☆27Updated 10 months ago
- ☆198Updated last year
- Official implementation of "WhisperNER: Unified Open Named Entity and Speech Recognition"☆200Updated 10 months ago