Agile Data Preparation Workflows madeΒ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
β1,536Dec 2, 2024Updated last year
Alternatives and similar repositories for optimus
Users that are interested in optimus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)β141Jul 15, 2023Updated 2 years ago
- Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per sβ¦β8,507Apr 1, 2026Updated last month
- NumPy and Pandas interface to Big Dataβ3,196Sep 29, 2023Updated 2 years ago
- Modin: Scale your Pandas workflows by changing a single line of codeβ10,391Feb 10, 2026Updated 3 months ago
- An open source python library for automated feature engineeringβ7,650Feb 3, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- pyspark methods to enhance developer productivity π£ π― πβ687Mar 6, 2025Updated last year
- Always know what to expect from your data.β11,525Updated this week
- A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewβ¦β2,162May 19, 2026Updated last week
- 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.β13,567Apr 22, 2026Updated last month
- Parallel computing with task schedulingβ13,845Updated this week
- Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.β2,241Jun 27, 2024Updated last year
- Build, Manage and Deploy AI/ML Systemsβ10,105May 18, 2026Updated last week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering andβ¦β10,868Updated this week
- π Parameterize, execute, and analyze notebooksβ6,447May 12, 2026Updated 2 weeks ago
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- An orchestration platform for the development, production, and observation of data assets.β15,565Updated this week
- A Python package for manipulating 2-dimensional tabular data structuresβ1,878Mar 17, 2025Updated last year
- the portable Python dataframe libraryβ6,545May 20, 2026Updated last week
- Clean APIs for data cleaning. Python implementation of R package Janitorβ1,494Updated this week
- A light-weight, flexible, and expressive statistical data testing libraryβ4,344May 21, 2026Updated last week
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.β22,442Updated this week
- Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.β3,618Updated this week
- The Open Source Feature Store for AI/MLβ7,052Updated this week
- π¦ Data Versioning and ML Experimentsβ15,620May 18, 2026Updated last week
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Jupyter magics and kernels for working with remote Spark clustersβ1,361Sep 9, 2025Updated 8 months ago
- Python Helper library for Jupyter Notebooksβ1,041Feb 16, 2021Updated 5 years ago
- A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.β10,051Sep 11, 2025Updated 8 months ago
- The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, aβ¦β26,072Updated this week
- Easy pipelines for pandas DataFrames.β724May 9, 2026Updated 2 weeks ago
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interactingβ¦β4,770May 1, 2026Updated 3 weeks ago
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visβ¦β18,723May 19, 2026Updated last week
- A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflowβ2,085Dec 15, 2023Updated 2 years ago
- MLeap: Deploy ML Pipelines to Productionβ1,535Mar 10, 2026Updated 2 months ago
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A package which efficiently applies any function to a pandas dataframe or series in the fastest available mannerβ2,640Mar 20, 2024Updated 2 years ago
- Visual analysis and diagnostic tools to facilitate machine learning model selection.β4,397Feb 19, 2025Updated last year
- A library for debugging/inspecting machine learning classifiers and explaining their predictionsβ2,775Apr 8, 2026Updated last month
- re_data - fix data issues before your users & CEO would discover them πβ1,569Apr 30, 2024Updated 2 years ago
- Automatic extraction of relevant features from time series:β9,219Nov 15, 2025Updated 6 months ago
- Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.β533Apr 26, 2026Updated last month
- Apache (Py)Spark type annotations (stub files).β118Aug 17, 2022Updated 3 years ago