hi-primus / optimusView external linksLinks
Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
☆1,541Dec 2, 2024Updated last year
Alternatives and similar repositories for optimus
Users that are interested in optimus are comparing it to the libraries listed below
Sorting:
- 🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)☆141Jul 15, 2023Updated 2 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…☆8,468Feb 5, 2026Updated last week
- An open source python library for automated feature engineering☆7,610Feb 3, 2026Updated last week
- Modin: Scale your Pandas workflows by changing a single line of code☆10,357Updated this week
- NumPy and Pandas interface to Big Data☆3,198Sep 29, 2023Updated 2 years ago
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…☆2,136Feb 5, 2026Updated last week
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.☆13,372Feb 2, 2026Updated last week
- Always know what to expect from your data.☆11,133Updated this week
- Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.☆2,236Jun 27, 2024Updated last year
- Parallel computing with task scheduling☆13,738Feb 5, 2026Updated last week
- Build, Manage and Deploy AI/ML Systems☆9,746Feb 5, 2026Updated last week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆10,756Updated this week
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Mar 6, 2025Updated 11 months ago
- 📚 Parameterize, execute, and analyze notebooks☆6,373Jan 5, 2026Updated last month
- Clean APIs for data cleaning. Python implementation of R package Janitor☆1,480Updated this week
- An orchestration platform for the development, production, and observation of data assets.☆14,930Updated this week
- the portable Python dataframe library☆6,385Feb 7, 2026Updated last week
- A light-weight, flexible, and expressive statistical data testing library☆4,190Feb 7, 2026Updated last week
- A Python package for manipulating 2-dimensional tabular data structures☆1,883Mar 17, 2025Updated 10 months ago
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.☆21,577Updated this week
- The Open Source Feature Store for AI/ML☆6,702Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.☆3,580Feb 2, 2026Updated last week
- 🦉 Data Versioning and ML Experiments☆15,347Feb 1, 2026Updated last week
- Python Helper library for Jupyter Notebooks☆1,040Feb 16, 2021Updated 4 years ago
- Easy pipelines for pandas DataFrames.☆723Jan 5, 2026Updated last month
- A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.☆10,048Sep 11, 2025Updated 5 months ago
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner☆2,643Mar 20, 2024Updated last year
- Visual analysis and diagnostic tools to facilitate machine learning model selection.☆4,395Feb 19, 2025Updated 11 months ago
- The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, …☆24,051Updated this week
- MLeap: Deploy ML Pipelines to Production☆1,532Jan 12, 2026Updated last month
- Jupyter magics and kernels for working with remote Spark clusters☆1,363Sep 9, 2025Updated 5 months ago
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…☆4,738Updated this week
- A library for debugging/inspecting machine learning classifiers and explaining their predictions☆2,771Updated this week
- Voilà turns Jupyter notebooks into standalone web applications☆5,893Feb 2, 2026Updated last week
- bamboolib - a GUI for pandas DataFrames☆953Feb 20, 2024Updated last year
- Declarative visualization library for Python☆10,246Feb 6, 2026Updated last week
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow