apache / datasketches-python
Apache datasketches
☆29Updated last month
Alternatives and similar repositories for datasketches-python:
Users that are interested in datasketches-python are comparing it to the libraries listed below
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 8 months ago
- A portable Pythonic Data Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to …☆205Updated this week
- A playground for running duckdb as a stateless query engine over a data lake☆197Updated last year
- Core C++ Sketch Library☆231Updated 2 months ago
- Ibis Substrait Compiler☆102Updated this week
- ☆256Updated last week
- ☆40Updated 2 weeks ago
- Redset is a dataset containing three months worth of user query metadata that ran on a selected sample of instances in the Amazon Redshif…☆58Updated 7 months ago
- ☆32Updated last year
- ☆57Updated last year
- A Python-to-SQL transpiler as replacement for Python Pandas☆48Updated 2 years ago
- Point-in-Time optimizations for Apache Spark☆29Updated last year
- Arrow, pydantic style☆82Updated 2 years ago
- ☆16Updated last week
- ☆69Updated 2 months ago
- PostgreSQL extension providing approximate algorithms based on apache/datasketches-cpp☆86Updated 3 months ago
- An example Flight SQL Server implementation - with DuckDB and SQLite back-ends.☆246Updated 7 months ago
- Apache Arrow PostgreSQL connector☆59Updated last year
- DuckDB extension for Delta Lake☆176Updated 3 weeks ago
- A library that provides useful extensions to Apache Spark and PySpark.☆224Updated last month
- DuckDB is an in-process SQL OLAP Database Management System☆43Updated last week
- ☆68Updated 3 months ago
- Python bindings for sqlparser-rs☆183Updated 2 months ago
- Unity Catalog UI☆40Updated 7 months ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆61Updated 2 years ago
- Inspect ML Pipelines in Python in the form of a DAG☆70Updated last year
- This is the main repository for SDF documentation found at docs.sdf.com, as well as public schemas, benchmarks, and examples☆118Updated 2 months ago
- A Delta Lake reader for Dask☆49Updated 6 months ago
- This repo contains information about DuckDB extensions found on GitHub. Refreshed daily☆96Updated this week
- Train Gradient Boosting and Random Forest with only SQL (VLDB 2023)☆23Updated last year