Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
β1,539Dec 2, 2024Updated last year
Alternatives and similar repositories for optimus
Users that are interested in optimus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)β141Jul 15, 2023Updated 2 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per sβ¦β8,498Mar 1, 2026Updated 3 weeks ago
- NumPy and Pandas interface to Big Dataβ3,195Sep 29, 2023Updated 2 years ago
- Modin: Scale your Pandas workflows by changing a single line of codeβ10,364Feb 10, 2026Updated last month
- An open source python library for automated feature engineeringβ7,626Feb 3, 2026Updated last month
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- pyspark methods to enhance developer productivity π£ π― πβ685Mar 6, 2025Updated last year
- Always know what to expect from your data.β11,280Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,142Mar 12, 2026Updated 2 weeks ago
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.β13,438Mar 3, 2026Updated 3 weeks ago
- Parallel computing with task schedulingβ13,774Mar 19, 2026Updated last week
- Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.β2,239Jun 27, 2024Updated last year
- Build, Manage and Deploy AI/ML Systemsβ9,973Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering andβ¦β10,798Mar 19, 2026Updated last week
- π Parameterize, execute, and analyze notebooksβ6,407Mar 16, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- An orchestration platform for the development, production, and observation of data assets.β15,134Updated this week
- A Python package for manipulating 2-dimensional tabular data structuresβ1,882Mar 17, 2025Updated last year
- the portable Python dataframe libraryβ6,457Mar 19, 2026Updated last week
- A light-weight, flexible, and expressive statistical data testing libraryβ4,271Updated this week
- Clean APIs for data cleaning. Python implementation of R package Janitorβ1,485Mar 15, 2026Updated last week
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.β21,910Mar 20, 2026Updated last week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,596Updated this week
- The Open Source Feature Store for AI/MLβ6,824Updated this week
- π¦ Data Versioning and ML Experimentsβ15,458Mar 18, 2026Updated last week
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Jupyter magics and kernels for working with remote Spark clustersβ1,362Sep 9, 2025Updated 6 months ago
- Python Helper library for Jupyter Notebooksβ1,041Feb 16, 2021Updated 5 years ago
- A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.β10,047Sep 11, 2025Updated 6 months ago
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, aβ¦β24,874Updated this week
- Easy pipelines for pandas DataFrames.β725Mar 6, 2026Updated 2 weeks ago
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interactingβ¦β4,751Mar 20, 2026Updated last week
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visβ¦β18,705Mar 18, 2026Updated last week
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflowβ2,084Dec 15, 2023Updated 2 years ago
- MLeap: Deploy ML Pipelines to Productionβ1,535Mar 10, 2026Updated 2 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available mannerβ2,641Mar 20, 2024Updated 2 years ago
- Visual analysis and diagnostic tools to facilitate machine learning model selection.β4,395Feb 19, 2025Updated last year
- A library for debugging/inspecting machine learning classifiers and explaining their predictionsβ2,775Feb 10, 2026Updated last month
- re_data - fix data issues before your users & CEO would discover them πβ1,569Apr 30, 2024Updated last year
- Automatic extraction of relevant features from time series:β9,151Nov 15, 2025Updated 4 months ago
- Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.β530Updated this week
- Apache (Py)Spark type annotations (stub files).β118Aug 17, 2022Updated 3 years ago