Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
β1,537Dec 2, 2024Updated last year
Alternatives and similar repositories for optimus
Users that are interested in optimus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)β141Jul 15, 2023Updated 2 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per sβ¦β8,501Apr 1, 2026Updated 2 weeks ago
- NumPy and Pandas interface to Big Dataβ3,194Sep 29, 2023Updated 2 years ago
- Modin: Scale your Pandas workflows by changing a single line of codeβ10,377Feb 10, 2026Updated 2 months ago
- An open source python library for automated feature engineeringβ7,629Feb 3, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- pyspark methods to enhance developer productivity π£ π― πβ687Mar 6, 2025Updated last year
- Always know what to expect from your data.β11,391Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,149Apr 1, 2026Updated 2 weeks ago
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.β13,493Updated this week
- Parallel computing with task schedulingβ13,799Apr 7, 2026Updated last week
- Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.β2,237Jun 27, 2024Updated last year
- Build, Manage and Deploy AI/ML Systemsβ10,040Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering andβ¦β10,822Updated this week
- π Parameterize, execute, and analyze notebooksβ6,429Apr 6, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An orchestration platform for the development, production, and observation of data assets.β15,312Updated this week
- A Python package for manipulating 2-dimensional tabular data structuresβ1,881Mar 17, 2025Updated last year
- the portable Python dataframe libraryβ6,493Apr 8, 2026Updated last week
- Clean APIs for data cleaning. Python implementation of R package Janitorβ1,487Updated this week
- A light-weight, flexible, and expressive statistical data testing libraryβ4,308Updated this week
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.β22,126Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,605Apr 1, 2026Updated 2 weeks ago
- The Open Source Feature Store for AI/MLβ6,956Updated this week
- π¦ Data Versioning and ML Experimentsβ15,524Apr 7, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Jupyter magics and kernels for working with remote Spark clustersβ1,361Sep 9, 2025Updated 7 months ago
- Python Helper library for Jupyter Notebooksβ1,041Feb 16, 2021Updated 5 years ago
- A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.β10,041Sep 11, 2025Updated 7 months ago
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, aβ¦β25,280Updated this week
- Easy pipelines for pandas DataFrames.β724Apr 6, 2026Updated last week
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interactingβ¦β4,757Apr 2, 2026Updated 2 weeks ago
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visβ¦β18,702Updated this week
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflowβ2,085Dec 15, 2023Updated 2 years ago
- MLeap: Deploy ML Pipelines to Productionβ1,536Mar 10, 2026Updated last month
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available mannerβ2,640Mar 20, 2024Updated 2 years ago
- Visual analysis and diagnostic tools to facilitate machine learning model selection.β4,399Feb 19, 2025Updated last year
- A library for debugging/inspecting machine learning classifiers and explaining their predictionsβ2,777Apr 8, 2026Updated last week
- re_data - fix data issues before your users & CEO would discover them πβ1,570Apr 30, 2024Updated last year
- Automatic extraction of relevant features from time series:β9,169Nov 15, 2025Updated 5 months ago
- Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.β531Updated this week
- Apache (Py)Spark type annotations (stub files).β118Aug 17, 2022Updated 3 years ago