Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
β1,534Dec 2, 2024Updated last year
Alternatives and similar repositories for optimus
Users that are interested in optimus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)β141Jul 15, 2023Updated 2 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per sβ¦β8,505Apr 1, 2026Updated last month
- NumPy and Pandas interface to Big Dataβ3,195Sep 29, 2023Updated 2 years ago
- Modin: Scale your Pandas workflows by changing a single line of codeβ10,384Feb 10, 2026Updated 2 months ago
- An open source python library for automated feature engineeringβ7,633Feb 3, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- pyspark methods to enhance developer productivity π£ π― πβ687Mar 6, 2025Updated last year
- Always know what to expect from your data.β11,458Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,155Updated this week
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.β13,534Apr 22, 2026Updated 2 weeks ago
- Parallel computing with task schedulingβ13,819Apr 28, 2026Updated last week
- Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.β2,241Jun 27, 2024Updated last year
- Build, Manage and Deploy AI/ML Systemsβ10,078Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering andβ¦β10,860Updated this week
- π Parameterize, execute, and analyze notebooksβ6,439Apr 6, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- An orchestration platform for the development, production, and observation of data assets.β15,425Updated this week
- A Python package for manipulating 2-dimensional tabular data structuresβ1,880Mar 17, 2025Updated last year
- the portable Python dataframe libraryβ6,521Apr 29, 2026Updated last week
- Clean APIs for data cleaning. Python implementation of R package Janitorβ1,488Updated this week
- A light-weight, flexible, and expressive statistical data testing libraryβ4,327Updated this week
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.β22,283Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,614Updated this week
- The Open Source Feature Store for AI/MLβ7,000Updated this week
- π¦ Data Versioning and ML Experimentsβ15,576Apr 28, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Jupyter magics and kernels for working with remote Spark clustersβ1,360Sep 9, 2025Updated 7 months ago
- Python Helper library for Jupyter Notebooksβ1,041Feb 16, 2021Updated 5 years ago
- A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.β10,049Sep 11, 2025Updated 7 months ago
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, aβ¦β25,667Updated this week
- Easy pipelines for pandas DataFrames.β724Updated this week
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interactingβ¦β4,762Updated this week
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visβ¦β18,712Apr 10, 2026Updated 3 weeks ago
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflowβ2,085Dec 15, 2023Updated 2 years ago
- MLeap: Deploy ML Pipelines to Productionβ1,535Mar 10, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available mannerβ2,641Mar 20, 2024Updated 2 years ago
- Visual analysis and diagnostic tools to facilitate machine learning model selection.β4,396Feb 19, 2025Updated last year
- A library for debugging/inspecting machine learning classifiers and explaining their predictionsβ2,777Apr 8, 2026Updated 3 weeks ago
- re_data - fix data issues before your users & CEO would discover them πβ1,569Apr 30, 2024Updated 2 years ago
- Automatic extraction of relevant features from time series:β9,183Nov 15, 2025Updated 5 months ago
- Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.β531Apr 26, 2026Updated last week
- Apache (Py)Spark type annotations (stub files).β118Aug 17, 2022Updated 3 years ago