google-research / retvec
RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.
☆288Updated 2 weeks ago
Alternatives and similar repositories for retvec:
Users that are interested in retvec are comparing it to the libraries listed below
- UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.☆132Updated 2 months ago
- ☆15Updated last year
- The Foundation Model Transparency Index☆76Updated 9 months ago
- A Natural Portuguese Language Benchmark (Napolab) for the evaluation of language models.☆66Updated 6 months ago
- The world's largest social media toxicity dataset.☆177Updated 2 years ago
- Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.☆45Updated 2 months ago
- HateBR is the first large-scale expert annotated dataset of Brazilian Instagram comments for hate speech and offensive language detection…☆32Updated last month
- ☆46Updated last year
- Lightweight Nearest Neighbors with Flexible Backends☆244Updated last week
- Alice in Wonderland code base for experiments and raw experiments data☆127Updated 2 weeks ago
- Finetuning InstructLLaMA with portuguese data☆562Updated last year
- ☆574Updated 2 months ago
- Inference code and configs for the ReplitLM model family☆959Updated last year
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆313Updated 2 months ago
- Granite 3.1 Language Models☆84Updated 2 months ago
- Transformer model for Portuguese language (Brazil pt_BR)☆15Updated 10 months ago
- A fully in-browser privacy solution to make Conversational AI privacy-friendly☆227Updated 4 months ago
- Tasks and tutorials using Graphore's IPU with Hugging Face. Originally at https://github.com/gradient-ai/Graphcore-HuggingFace☆13Updated 11 months ago
- Masked Python SDK wrapper for OpenAI API. Use public LLM APIs securely.☆116Updated last year
- Definition for Open Weights LIcensing☆134Updated 4 months ago
- An implementation of bucketMul LLM inference☆215Updated 7 months ago
- Efficient vector database for hundred millions of embeddings.☆206Updated 9 months ago
- ☆26Updated 11 months ago
- GPU-Powered Topic Modelling☆70Updated 2 years ago
- ☆207Updated 7 months ago
- Extend the original llama.cpp repo to support redpajama model.☆117Updated 5 months ago
- ☆199Updated last year
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆47Updated 8 months ago
- Zero-trust AI APIs for easy and private consumption of open-source LLMs☆37Updated 7 months ago
- GPT Takes the Bar Exam☆141Updated 2 years ago