Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
β1,535Dec 2, 2024Updated last year
Alternatives and similar repositories for optimus
Users that are interested in optimus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)β141Jul 15, 2023Updated 2 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per sβ¦β8,504Apr 1, 2026Updated 2 months ago
- NumPy and Pandas interface to Big Dataβ3,192Sep 29, 2023Updated 2 years ago
- Modin: Scale your Pandas workflows by changing a single line of codeβ10,388Feb 10, 2026Updated 4 months ago
- An open source python library for automated feature engineeringβ7,653Jun 9, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- pyspark methods to enhance developer productivity π£ π― πβ687Jun 9, 2026Updated last week
- Always know what to expect from your data.β11,556Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,165May 19, 2026Updated 3 weeks ago
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.β13,602Apr 22, 2026Updated last month
- Parallel computing with task schedulingβ13,846Updated this week
- Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.β2,243Jun 27, 2024Updated last year
- Build, Manage and Deploy AI/ML Systemsβ10,129Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering andβ¦β10,887Updated this week
- π Parameterize, execute, and analyze notebooksβ6,450May 12, 2026Updated last month
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An orchestration platform for the development, production, and observation of data assets.β15,699Updated this week
- A Python package for manipulating 2-dimensional tabular data structuresβ1,878Mar 17, 2025Updated last year
- the portable Python dataframe libraryβ6,573Updated this week
- Clean APIs for data cleaning. Python implementation of R package Janitorβ1,496Jun 10, 2026Updated last week
- A light-weight, flexible, and expressive statistical data testing libraryβ4,376Updated this week
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.β22,598Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,622Updated this week
- The Open Source Feature Store for AI/MLβ7,085Jun 10, 2026Updated last week
- π¦ Data Versioning and ML Experimentsβ15,675Jun 8, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Jupyter magics and kernels for working with remote Spark clustersβ1,360Sep 9, 2025Updated 9 months ago
- Python Helper library for Jupyter Notebooksβ1,041Feb 16, 2021Updated 5 years ago
- A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.β10,049Sep 11, 2025Updated 9 months ago
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, aβ¦β26,506Updated this week
- Easy pipelines for pandas DataFrames.β729Jun 5, 2026Updated last week
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interactingβ¦β4,774Jun 1, 2026Updated 2 weeks ago
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visβ¦β18,738May 19, 2026Updated 3 weeks ago
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflowβ2,086Dec 15, 2023Updated 2 years ago
- MLeap: Deploy ML Pipelines to Productionβ1,538Mar 10, 2026Updated 3 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available mannerβ2,641Mar 20, 2024Updated 2 years ago
- Visual analysis and diagnostic tools to facilitate machine learning model selection.β4,399Feb 19, 2025Updated last year
- A library for debugging/inspecting machine learning classifiers and explaining their predictionsβ2,776Apr 8, 2026Updated 2 months ago
- re_data - fix data issues before your users & CEO would discover them πβ1,567Apr 30, 2024Updated 2 years ago
- Automatic extraction of relevant features from time series:β9,247Jun 8, 2026Updated last week
- Engine for AI/ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.β532Jun 10, 2026Updated last week
- Apache (Py)Spark type annotations (stub files).β118Aug 17, 2022Updated 3 years ago