suhailrehman / fuzzydataLinks
Fuzzy Data Benchmark
☆17Updated last year
Alternatives and similar repositories for fuzzydata
Users that are interested in fuzzydata are comparing it to the libraries listed below
Sorting:
- Ibis Substrait Compiler☆105Updated last week
- RFC document, tooling and other content related to the dataframe API standard☆108Updated last year
- ☆34Updated 2 years ago
- ☆112Updated 3 weeks ago
- Distributed SQL Query Engine in Python using Ray☆246Updated last year
- Unified Distributed Execution☆56Updated 11 months ago
- A Python-to-SQL transpiler as replacement for Python Pandas☆49Updated 2 years ago
- ☆48Updated 3 months ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated 2 years ago
- Apache Arrow PostgreSQL connector☆62Updated last year
- A portable Multimodal Lakehouse powered by Ray that brings exabyte-level scalability and fast, ACID-compliant, change-data-capture to you…☆245Updated last week
- Distributed SQL Engine in Python using Dask☆407Updated last year
- Arrow, pydantic style☆85Updated 2 years ago
- Train Gradient Boosting and Random Forest with only SQL (VLDB 2023)☆25Updated last year
- A playground for running duckdb as a stateless query engine over a data lake☆211Updated last year
- Dias: Dynamic Rewriting of Pandas Code☆79Updated 2 months ago
- Embedded MonetDB with a Python frontend and fast Numpy/Pandas support☆63Updated last year
- Helpers for Arrow C Data & Arrow C Stream interfaces☆207Updated last week
- DuckDB is an in-process SQL OLAP Database Management System☆44Updated 2 months ago
- ☆38Updated this week
- IbisML is a library for building scalable ML pipelines using Ibis.☆116Updated 2 months ago
- A Delta Lake reader for Dask☆53Updated 2 months ago
- Flow with FlorDB 🌻☆153Updated 2 weeks ago
- reproducible benchmark of database-like ops☆174Updated last week
- High-Performance Python Compute Engine for Data and AI☆306Updated this week
- Apache Arrow Flight SQL adapter for PostgreSQL☆95Updated last month
- Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.☆301Updated last year
- ☆88Updated 8 months ago
- Coming soon☆62Updated last year
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆114Updated last year