ibis-project / ibis
the portable Python dataframe library
☆5,064Updated this week
Related projects: ⓘ
- A light-weight, flexible, and expressive statistical data testing library☆3,258Updated this week
- Parallel computing with task scheduling☆12,405Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆1,968Updated last month
- Always know what to expect from your data.☆9,817Updated this week
- An orchestration platform for the development, production, and observation of data assets.☆11,155Updated this week
- 📚 Parameterize, execute, and analyze notebooks☆5,789Updated 3 weeks ago
- dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build application…☆9,626Updated this week
- Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, v…☆3,789Updated this week
- Fastest library to load data from DB to DataFrames in Rust and Python☆1,933Updated this week
- Python SQL Parser and Transpiler☆6,395Updated this week
- Computing with Python functions.☆3,815Updated 3 weeks ago
- Koalas: pandas API on Apache Spark☆3,329Updated 5 months ago
- Modin: Scale your Pandas workflows by changing a single line of code☆9,747Updated this week
- Efficient data transformation and modeling framework that is backwards compatible with dbt.☆1,612Updated this week
- Build and manage real-life ML, AI, and data science projects with ease!☆8,046Updated this week
- Utils for streaming large files (S3, HDFS, gzip, bz2...)☆3,174Updated this week
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,257Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆9,837Updated this week
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.☆15,830Updated this week
- Declarative statistical visualization library for Python☆9,227Updated this week
- A Python package for manipulating 2-dimensional tabular data structures☆1,807Updated 9 months ago
- Distributed DataFrame for Python designed for the cloud, powered by Rust☆2,080Updated this week
- The Open Source Feature Store for Machine Learning☆5,476Updated this week
- Python Stream Processing☆1,467Updated this week
- A native Rust library for Delta Lake, with bindings into Python☆2,155Updated this week
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io☆1,866Updated last week
- NumPy and Pandas interface to Big Data☆3,180Updated 11 months ago
- The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️☆3,479Updated 2 months ago
- Voilà turns Jupyter notebooks into standalone web applications☆5,394Updated 2 weeks ago
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow☆2,068Updated 9 months ago