hi-primus/optimus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hi-primus/optimus)

hi-primus / optimus

Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

☆1,536

Alternatives and similar repositories for optimus

Users that are interested in optimus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hi-primus / bumblebee
View on GitHub
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
☆141Jul 15, 2023Updated 3 years ago
databricks / koalas
View on GitHub
Koalas: pandas API on Apache Spark
☆3,372Mar 20, 2024Updated 2 years ago
vaexio / vaex
View on GitHub
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…
☆8,510Apr 1, 2026Updated 3 months ago
blaze / blaze
View on GitHub
NumPy and Pandas interface to Big Data
☆3,189Sep 29, 2023Updated 2 years ago
modin-project / modin
View on GitHub
Modin: Scale your Pandas workflows by changing a single line of code
☆10,393Feb 10, 2026Updated 5 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
alteryx / featuretools
View on GitHub
An open source python library for automated feature engineering
☆7,666Updated this week
mrpowers-io / quinn
View on GitHub
pyspark methods to enhance developer productivity 📣 👯 🎉
☆687Jun 9, 2026Updated last month
fivetran / great_expectations
View on GitHub
Always know what to expect from your data.
☆11,675Updated this week
Data-Centric-AI-Community / fg-data-profiling
View on GitHub
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
☆13,654Apr 22, 2026Updated 3 months ago
fugue-project / fugue
View on GitHub
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rew…
☆2,170May 19, 2026Updated 2 months ago
dask / dask
View on GitHub
Parallel computing with task scheduling
☆13,871Jul 20, 2026Updated last week
sfu-db / dataprep
View on GitHub
Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
☆2,247Jun 27, 2024Updated 2 years ago
Netflix / metaflow
View on GitHub
Build, Manage and Deploy AI/ML Systems
☆10,198Updated this week
kedro-org / kedro
View on GitHub
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…
☆10,937Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
nteract / papermill
View on GitHub
📚 Parameterize, execute, and analyze notebooks
☆6,462Jul 6, 2026Updated 3 weeks ago
dagster-io / dagster
View on GitHub
An orchestration platform for the development, production, and observation of data assets.
☆15,909Updated this week
h2oai / datatable
View on GitHub
A Python package for manipulating 2-dimensional tabular data structures
☆1,876Updated this week
ibis-project / ibis
View on GitHub
the portable Python dataframe library
☆6,612Updated this week
unionai-oss / pandera
View on GitHub
A light-weight, flexible, and expressive statistical data testing library
☆4,411Jul 18, 2026Updated last week
pyjanitor-devs / pyjanitor
View on GitHub
Clean APIs for data cleaning. Python implementation of R package Janitor
☆1,501Jul 20, 2026Updated last week
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,638Updated this week
feast-dev / feast
View on GitHub
The Open Source Feature Store for AI/ML
☆7,178Updated this week
PrefectHQ / prefect
View on GitHub
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
☆23,494Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
treeverse / dvc
View on GitHub
🦉 Data Versioning and ML Experiments
☆15,776Jul 21, 2026Updated last week
pixiedust / pixiedust
View on GitHub
Python Helper library for Jupyter Notebooks
☆1,040Feb 16, 2021Updated 5 years ago
EpistasisLab / tpot
View on GitHub
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
☆10,050Sep 11, 2025Updated 10 months ago
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,364Sep 9, 2025Updated 10 months ago
mlflow / mlflow
View on GitHub
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, a…
☆27,235Updated this week
spotify / luigi
View on GitHub
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…
☆18,752Jul 18, 2026Updated last week
amundsen-io / amundsen
View on GitHub
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…
☆4,780Jul 1, 2026Updated 3 weeks ago
pdpipe / pdpipe
View on GitHub
Easy pipelines for pandas DataFrames.
☆729Jul 6, 2026Updated 3 weeks ago
mara / mara-pipelines
View on GitHub
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
☆2,089Dec 15, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
combust / mleap
View on GitHub
MLeap: Deploy ML Pipelines to Production
☆1,539Updated this week
jmcarpenter2 / swifter
View on GitHub
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
☆2,639Mar 20, 2024Updated 2 years ago
DistrictDataLabs / yellowbrick
View on GitHub
Visual analysis and diagnostic tools to facilitate machine learning model selection.
☆4,400Feb 19, 2025Updated last year
TeamHG-Memex / eli5
View on GitHub
A library for debugging/inspecting machine learning classifiers and explaining their predictions
☆2,777Apr 8, 2026Updated 3 months ago
polyaxon / traceml
View on GitHub
Engine for AI/ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.
☆534Jun 17, 2026Updated last month
re-data / re-data
View on GitHub
re_data - fix data issues before your users & CEO would discover them 😊
☆1,566Apr 30, 2024Updated 2 years ago
blue-yonder / tsfresh
View on GitHub
Automatic extraction of relevant features from time series:
☆9,277Jul 6, 2026Updated 3 weeks ago