apache / datasketches-pythonLinks
Apache datasketches
☆33Updated 5 months ago
Alternatives and similar repositories for datasketches-python
Users that are interested in datasketches-python are comparing it to the libraries listed below
Sorting:
- ☆11Updated 2 years ago
- ☆34Updated 2 years ago
- A portable Pythonic Data Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to …☆233Updated last week
- DuckDB extension that adds support for SQL/PGQ and graph algorithms☆239Updated 2 weeks ago
- Ibis Substrait Compiler☆104Updated 3 weeks ago
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- A playground for running duckdb as a stateless query engine over a data lake☆210Updated last year
- ☆46Updated last month
- Train Gradient Boosting and Random Forest with only SQL (VLDB 2023)☆23Updated last year
- ☆58Updated last year
- DuckDB is an in-process SQL OLAP Database Management System☆44Updated 3 weeks ago
- Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning☆51Updated 2 years ago
- Code and Benchmarks for JOSIE (SIGMOD 2019)☆19Updated 2 years ago
- Distributed Bayesian Entity Resolution in Apache Spark☆57Updated 4 years ago
- Work with your web service, database, and streaming schemas in a single format.☆345Updated last month
- ☆79Updated 2 years ago
- Graph Engine for Exploration and Search☆42Updated last year
- Arrow, pydantic style☆84Updated 2 years ago
- ☆298Updated last week
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆29Updated 8 months ago
- An example Flight SQL Server implementation - with DuckDB and SQLite back-ends.☆265Updated 10 months ago
- Ray-based Apache Beam runner☆41Updated last year
- Embedded MonetDB with a Python frontend and fast Numpy/Pandas support☆63Updated 10 months ago
- A Python-to-SQL transpiler as replacement for Python Pandas☆48Updated 2 years ago
- A collection of handy CLI tools to convert CSV and JSON to Apache Arrow and Parquet☆183Updated last week
- Core C++ Sketch Library☆236Updated 3 weeks ago
- PyPi module for Graphlet AI Knowledge Graph Factory☆29Updated 2 years ago
- Unified Distributed Execution☆55Updated 9 months ago
- Pollock is a benchmark for data loading on character-delimited files.☆20Updated 3 months ago
- ☆9Updated last year