huggingface / dedupe_estimatorLinks
Chunk Dedupe Estimation
☆17Updated 10 months ago
Alternatives and similar repositories for dedupe_estimator
Users that are interested in dedupe_estimator are comparing it to the libraries listed below
Sorting:
- Rust crates for XetHub☆60Updated 11 months ago
- Smart reproducible analytical pipeline inspection☆19Updated 5 months ago
- ☆12Updated last year
- ☆39Updated this week
- Python SDK for XetHub☆56Updated 11 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆24Updated 6 months ago
- Radio is a DuckDB extension by Query.Farm that brings real-time event streams into your SQL workflows. It enables DuckDB to receive and s…☆30Updated 3 months ago
- First token cutoff sampling inference example☆31Updated last year
- ☆15Updated 2 weeks ago
- ☆12Updated last year
- Your buddy in the (L)LM space.☆64Updated last year
- tsellm: LLMs in SQLite and DuckDB☆24Updated 5 months ago
- Vector Database with support for late interaction and token level embeddings.☆55Updated 3 months ago
- FalkorDB-Browser is a visualization UI for FalkorDB.☆54Updated this week
- ☆20Updated 11 months ago
- ColBERT for live vector indexes☆28Updated 11 months ago
- Embedding models from Jina AI☆65Updated last year
- ☆28Updated 5 months ago
- Pivotal Token Search☆125Updated 2 months ago
- Modular, open source LLMOps stack that separates concerns: LiteLLM unifies LLM APIs, manages routing and cost controls, and ensures high-…☆116Updated 7 months ago
- Efficiently computing & storing token n-grams from large corpora☆26Updated 11 months ago
- Datasette enrichment for analyzing row data using OpenAI's GPT models☆20Updated last year
- ☆18Updated last year
- Rats is a collection of tools to help researchers define and run experiments. It is designed to be a modular and extensible framework cur…☆26Updated last week
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆35Updated this week
- Transformer GPU VRAM estimator☆66Updated last year
- A Pub/Sub for Tables based data integration platform, to discover, publish, modify and consume data effortlessly.☆34Updated last week
- Efficient BM25 with DuckDB 🦆☆55Updated 9 months ago
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆18Updated last year
- Datamodels for hugging face tokenizers☆71Updated this week