fugue-project/fugue

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/fugue-project/fugue)

fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

☆2,170

Alternatives and similar repositories for fugue

Users that are interested in fugue are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fugue-project / tutorials
View on GitHub
Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…
☆114Nov 10, 2025Updated 8 months ago
ibis-project / ibis
View on GitHub
the portable Python dataframe library
☆6,606Updated this week
unionai-oss / pandera
View on GitHub
A light-weight, flexible, and expressive statistical data testing library
☆4,409Jul 18, 2026Updated last week
Eventual-Inc / Daft
View on GitHub
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
☆5,658Updated this week
whylabs / whylogs
View on GitHub
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model perf…
☆2,828Jan 10, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
fugue-project / tune
View on GitHub
An abstraction layer for parameter tuning
☆35Dec 16, 2025Updated 7 months ago
modin-project / modin
View on GitHub
Modin: Scale your Pandas workflows by changing a single line of code
☆10,393Feb 10, 2026Updated 5 months ago
SQLMesh / sqlmesh
View on GitHub
Scalable and efficient data transformation framework - backwards compatible with dbt.
☆3,219Updated this week
fivetran / great_expectations
View on GitHub
Always know what to expect from your data.
☆11,667Updated this week
tobymao / sqlglot
View on GitHub
Python SQL Parser and Transpiler
☆9,457Updated this week
marsupialtail / quokka
View on GitHub
Making data lake work for time series
☆1,192Aug 21, 2024Updated last year
ploomber / ploomber
View on GitHub
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
☆3,622May 29, 2025Updated last year
vaexio / vaex
View on GitHub
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…
☆8,510Apr 1, 2026Updated 3 months ago
LineaLabs / lineapy
View on GitHub
Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two line…
☆669Feb 22, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Nixtla / statsforecast
View on GitHub
Lightning ⚡️ fast forecasting with statistical and econometric models.
☆4,849Updated this week
lance-format / lance
View on GitHub
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data ve…
☆6,845Updated this week
dagster-io / dagster
View on GitHub
An orchestration platform for the development, production, and observation of data assets.
☆15,891Updated this week
pola-rs / polars
View on GitHub
Extremely fast Query Engine for DataFrames, written in Rust
☆39,089Updated this week
sfu-db / connector-x
View on GitHub
Fastest library to load data from DB to DataFrames in Rust and Python
☆2,638Updated this week
kedro-org / kedro
View on GitHub
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…
☆10,931Updated this week
sodadata / soda-core
View on GitHub
Data Contracts engine for the modern data stack. https://www.soda.io
☆2,397Updated this week
flyteorg / flyte
View on GitHub
Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows.
☆7,149Updated this week
narwhals-dev / narwhals
View on GitHub
Lightweight and extensible compatibility layer between dataframe libraries!
☆1,686Updated this week
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
PrefectHQ / prefect
View on GitHub
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
☆23,474Updated this week
deepchecks / deepchecks
View on GitHub
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML va…
☆4,039Dec 28, 2025Updated 6 months ago
Netflix / metaflow
View on GitHub
Build, Manage and Deploy AI/ML Systems
☆10,194Updated this week
online-ml / river
View on GitHub
🌊 Online machine learning in Python
☆5,887Updated this week
dask-contrib / dask-sql
View on GitHub
Distributed SQL Engine in Python using Dask
☆411Aug 29, 2024Updated last year
feast-dev / feast
View on GitHub
The Open Source Feature Store for AI/ML
☆7,170Updated this week
bytewax / bytewax
View on GitHub
Python Stream Processing
☆2,037Jun 20, 2026Updated last month
delta-io / delta-rs
View on GitHub
A native Rust library for Delta Lake, with bindings into Python
☆3,267Updated this week
orchest / orchest
View on GitHub
Build data pipelines, the easy way 🛠️
☆4,135Jun 6, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
apache / datafusion
View on GitHub
Apache DataFusion SQL Query Engine
☆9,010Updated this week
OpenLineage / OpenLineage
View on GitHub
An Open Standard for lineage metadata collection
☆2,560Updated this week
nteract / papermill
View on GitHub
📚 Parameterize, execute, and analyze notebooks
☆6,461Jul 6, 2026Updated 2 weeks ago
re-data / re-data
View on GitHub
re_data - fix data issues before your users & CEO would discover them 😊
☆1,566Apr 30, 2024Updated 2 years ago
dlt-hub / dlt
View on GitHub
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
☆5,657Updated this week
stitchfix / hamilton
View on GitHub
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
☆860Jul 3, 2023Updated 3 years ago
linkedin / greykite
View on GitHub
A flexible, intuitive and fast forecasting library
☆1,855Feb 20, 2025Updated last year