xdssio / big_data_benchmarks
big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.
☆65Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for big_data_benchmarks
- General Interpretability Package☆58Updated last year
- Talks about vaex☆36Updated last year
- Automated Data Science and Machine Learning library to optimize workflow.☆104Updated last year
- Hypergol is a Data Science/Machine Learning productivity toolkit to accelerate any projects into production with autogenerated code, stan…☆53Updated last year
- Tries to shrink your Pandas column dtypes with no data loss so you have more spare RAM☆82Updated 10 months ago
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆34Updated 4 years ago
- A bit of extra usability for sqlalchemy v2.☆77Updated 5 months ago
- python library for automated dataset normalization☆112Updated last year
- ☆23Updated this week
- ☆28Updated 5 years ago
- NitroML is a modular, portable, and scalable model-quality benchmarking framework for Machine Learning and Automated Machine Learning (Au…☆42Updated 3 years ago
- Interactive visualization of machine learning model evaluation metrics☆62Updated 5 years ago
- A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn fr…☆55Updated 3 years ago
- Spark NLP for Streamlit☆15Updated 3 years ago
- Guide for applying Unit Testing in data-driven projects☆19Updated 4 years ago
- bamboolib - a GUI for pandas dataframes. Stop googling pandas commands☆28Updated 4 years ago
- Makes Interactive Chart Widget, Cleans raw data, Runs baseline models, Interactive hyperparameter tuning & tracking☆55Updated 3 years ago
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆104Updated last year
- Materials for the SciPy 2019 RAPIDS tutorial☆21Updated 5 years ago
- Record matching and entity resolution at scale in Spark☆31Updated last year
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Examples of using vaex☆72Updated 8 months ago
- Tutorial for a new versioning Machine Learning pipeline☆81Updated 3 years ago
- Summarise and explore Pandas DataFrames☆100Updated 4 years ago
- This project is created to promote and advocate the use of FOSS machine learning.☆44Updated 2 months ago
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- MLflow-tracking server example with Minio and H2O☆18Updated 5 years ago
- Pandas helper functions☆29Updated last year
- A frictionless integrated platform for notebook☆85Updated last year
- Comparing Polars to Pandas and a small introduction☆43Updated 3 years ago