xdssio / big_data_benchmarks
big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.
☆65Updated 4 years ago
Alternatives and similar repositories for big_data_benchmarks:
Users that are interested in big_data_benchmarks are comparing it to the libraries listed below
- Automated Data Science and Machine Learning library to optimize workflow.☆104Updated last year
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆34Updated 4 years ago
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆104Updated last year
- Makes Interactive Chart Widget, Cleans raw data, Runs baseline models, Interactive hyperparameter tuning & tracking☆55Updated 3 years ago
- python library for automated dataset normalization☆113Updated last year
- Talks about vaex☆36Updated 2 years ago
- A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn fr…☆55Updated 3 years ago
- The fast.ai data ethics course☆14Updated 2 years ago
- Projects developed by Domino's R&D team☆76Updated 2 years ago
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Data Analysis Baseline Library☆130Updated 3 months ago
- Tries to shrink your Pandas column dtypes with no data loss so you have more spare RAM☆83Updated last year
- 🎛 Distributed machine learning made simple.☆49Updated last year
- General Interpretability Package☆58Updated 2 years ago
- A machine learning testing framework for sklearn and pandas. The goal is to help folks assess whether things have changed over time.☆101Updated 3 years ago
- Documentation and resources for deploying JupyterHub on Hadoop☆18Updated 5 years ago
- Using the Parquet file format with Python☆15Updated last year
- Tutorial for a new versioning Machine Learning pipeline☆81Updated 3 years ago
- A collection of Machine Learning examples to get started with deploying RAPIDS in the Cloud☆138Updated 2 months ago
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- Using Kafka-Python to illustrate a ML production pipeline☆109Updated 2 years ago
- Visualization ideas for data science☆19Updated 6 years ago
- Summarise and explore Pandas DataFrames☆99Updated 4 years ago
- Process, visualize and use data easily.☆20Updated last year
- Hypergol is a Data Science/Machine Learning productivity toolkit to accelerate any projects into production with autogenerated code, stan…☆53Updated last year
- A bit of extra usability for sqlalchemy v2.☆77Updated 8 months ago
- Building an API with the FastAPI framework to serve a scikit-learn model.☆18Updated 6 years ago
- Distributed, large-scale, benchmarking framework for rigorous assessment of automatic machine learning repositories, projects, and librar…☆30Updated 2 years ago
- Guide for applying Unit Testing in data-driven projects☆19Updated 4 years ago
- Interactive visualization of machine learning model evaluation metrics☆62Updated 5 years ago