Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
β1,540Dec 2, 2024Updated last year
Alternatives and similar repositories for optimus
Users that are interested in optimus are comparing it to the libraries listed below
Sorting:
- π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)β141Jul 15, 2023Updated 2 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per sβ¦β8,483Updated this week
- An open source python library for automated feature engineeringβ7,617Feb 3, 2026Updated last month
- Modin: Scale your Pandas workflows by changing a single line of codeβ10,363Feb 10, 2026Updated 3 weeks ago
- NumPy and Pandas interface to Big Dataβ3,196Sep 29, 2023Updated 2 years ago
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,139Feb 21, 2026Updated last week
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.β13,399Feb 27, 2026Updated last week
- Always know what to expect from your data.β11,197Feb 27, 2026Updated last week
- Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.β2,237Jun 27, 2024Updated last year
- Parallel computing with task schedulingβ13,754Updated this week
- Build, Manage and Deploy AI/ML Systemsβ9,903Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering andβ¦β10,771Feb 26, 2026Updated last week
- pyspark methods to enhance developer productivity π£ π― πβ683Mar 6, 2025Updated last year
- π Parameterize, execute, and analyze notebooksβ6,390Feb 27, 2026Updated last week
- Clean APIs for data cleaning. Python implementation of R package Janitorβ1,483Updated this week
- An orchestration platform for the development, production, and observation of data assets.β15,049Updated this week
- the portable Python dataframe libraryβ6,440Updated this week
- A Python package for manipulating 2-dimensional tabular data structuresβ1,883Mar 17, 2025Updated 11 months ago
- A light-weight, flexible, and expressive statistical data testing libraryβ4,218Feb 19, 2026Updated 2 weeks ago
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.β21,697Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,588Feb 17, 2026Updated 2 weeks ago
- The Open Source Feature Store for AI/MLβ6,756Updated this week
- π¦ Data Versioning and ML Experimentsβ15,404Feb 27, 2026Updated last week
- Python Helper library for Jupyter Notebooksβ1,040Feb 16, 2021Updated 5 years ago
- Easy pipelines for pandas DataFrames.β725Updated this week
- A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.β10,046Sep 11, 2025Updated 5 months ago
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available mannerβ2,642Mar 20, 2024Updated last year
- Visual analysis and diagnostic tools to facilitate machine learning model selection.β4,396Feb 19, 2025Updated last year
- The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, β¦β24,485Updated this week
- MLeap: Deploy ML Pipelines to Productionβ1,536Jan 12, 2026Updated last month
- Jupyter magics and kernels for working with remote Spark clustersβ1,362Sep 9, 2025Updated 5 months ago
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interactingβ¦β4,744Updated this week
- A library for debugging/inspecting machine learning classifiers and explaining their predictionsβ2,772Feb 10, 2026Updated 3 weeks ago
- VoilΓ turns Jupyter notebooks into standalone web applicationsβ5,902Updated this week
- bamboolib - a GUI for pandas DataFramesβ952Feb 20, 2024Updated 2 years ago
- Declarative visualization library for Pythonβ10,276Feb 27, 2026Updated last week
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflowβ2,086Dec 15, 2023Updated 2 years ago
- Automatic extraction of relevant features from time series:β9,127Nov 15, 2025Updated 3 months ago
- re_data - fix data issues before your users & CEO would discover them πβ1,569Apr 30, 2024Updated last year