huggingface / dedupe_estimator
Chunk Dedupe Estimation
☆12Updated 4 months ago
Alternatives and similar repositories for dedupe_estimator:
Users that are interested in dedupe_estimator are comparing it to the libraries listed below
- tsellm: LLMs in SQLite and DuckDB☆22Updated 7 months ago
- Rust crates for XetHub☆36Updated 5 months ago
- "llm python" is a command to run a Python interpreter in the LLM virtual environment☆31Updated last year
- Hybrid Search (BM25 & Vector) with SQLite☆13Updated 7 months ago
- ☆12Updated last year
- ☆36Updated 2 weeks ago
- Tabsdata Open Source☆25Updated last week
- ColBERT for live vector indexes☆22Updated 5 months ago
- ☆25Updated last week
- OpenDAL fsspec integration☆28Updated 2 months ago
- Tools for building SQLite databases from files and directories☆12Updated last year
- Feature selection for tabular datasets using advanced filter and wrapper methods☆17Updated 3 weeks ago
- Sample code to accompany blog post showcasing Arrow Flight SQL running on DuckDB☆32Updated 2 years ago
- ☆17Updated 10 months ago
- CuVS integration for Lucene☆33Updated 3 months ago
- 🛡️ Managed isolated environments for Python☆88Updated last month
- An open-source, community-driven REST catalog for Apache Iceberg!☆26Updated 9 months ago
- A collection of prompts for use with the LLM CLI tool☆15Updated last year
- A conda-smithy repository for python-duckdb.☆13Updated 2 weeks ago
- 🛤️ Pathik - High-Performance Web Crawler ⚡☆25Updated this week
- Inspect Your Servers with DuckDB☆30Updated 2 years ago
- GizmoSQL Public repo - used for README purposes and to make artifacts available for public download☆19Updated 2 weeks ago
- A high-performance, in-memory, git-backed OLAP database (of nothing).☆12Updated 2 months ago
- The (B)ig (F)unction (T)axonomy is a detailed reference for common compute functions executed by different libraries, databases, and tool…☆16Updated 3 months ago
- Experimental ClickHouse Native Client and Native file reader Extension for DuckDB chsql☆11Updated this week
- Display version and compression information about a parquet file☆23Updated 2 weeks ago
- ☆87Updated this week
- ☆14Updated last year
- Excel extension for DuckDB☆28Updated last week
- Speed up fsspec data access with Alluxio distributed caching.☆14Updated last week