xdssio / big_data_benchmarks
big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.
☆65Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for big_data_benchmarks
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆34Updated 4 years ago
- Hypergol is a Data Science/Machine Learning productivity toolkit to accelerate any projects into production with autogenerated code, stan…☆53Updated last year
- Makes Interactive Chart Widget, Cleans raw data, Runs baseline models, Interactive hyperparameter tuning & tracking☆55Updated 2 years ago
- Tries to shrink your Pandas column dtypes with no data loss so you have more spare RAM☆82Updated 9 months ago
- General Interpretability Package☆58Updated last year
- Automated Data Science and Machine Learning library to optimize workflow.☆104Updated last year
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- Record matching and entity resolution at scale in Spark☆31Updated last year
- ☆29Updated 4 years ago
- MLOps simplified. One platform, all the functionality you need. Swiss made☆95Updated this week
- Talks about vaex☆36Updated last year
- Spark NLP for Streamlit☆15Updated 3 years ago
- H2OAI Driverless AI Code Samples and Tutorials☆37Updated 2 weeks ago
- ☆44Updated 8 months ago
- An abstraction layer for parameter tuning☆36Updated 2 months ago
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago
- Comparing Polars to Pandas and a small introduction☆43Updated 3 years ago
- python library for automated dataset normalization☆111Updated last year
- Python implementation of R package breakDown☆41Updated last year
- Repository for the research and implementation of categorical encoding into a Featuretools-compatible Python library☆50Updated 2 years ago
- Documentation and resources for deploying JupyterHub on Hadoop☆18Updated 5 years ago
- Repo for PyData 2019 Tutorial - New Trends in Estimation and Inference☆26Updated 5 years ago
- ForML - A development framework and MLOps platform for the lifecycle management of data science projects☆104Updated last year
- A machine learning testing framework for sklearn and pandas. The goal is to help folks assess whether things have changed over time.☆101Updated 3 years ago
- A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn fr…☆55Updated 3 years ago
- JupyterLab extension to create GitHub commits & pull requests☆113Updated 4 months ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.☆42Updated last year
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- ☆29Updated 10 months ago
- A frictionless integrated platform for notebook☆85Updated last year