Spratiher9 / SparkDataset
Instant search for and access to many datasets in Pyspark.
β34Updated 2 years ago
Alternatives and similar repositories for SparkDataset:
Users that are interested in SparkDataset are comparing it to the libraries listed below
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API πβ53Updated 3 years ago
- β16Updated 4 years ago
- β17Updated 4 years ago
- Projects developed by Domino's R&D teamβ76Updated 2 years ago
- π Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projectsβ81Updated 3 years ago
- β16Updated 4 years ago
- β12Updated 4 years ago
- Automatically transform all categorical, date-time, NLP variables to numeric in a single line of code for any data set any size.β64Updated 2 months ago
- β18Updated 3 years ago
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics β¦β20Updated 3 years ago
- Deploy A/B testing infrastructure in a containerized microservice architecture for Machine Learning applications.β40Updated 3 months ago
- β16Updated last year
- Record matching and entity resolution at scale in Sparkβ34Updated last year
- Hypergol is a Data Science/Machine Learning productivity toolkit to accelerate any projects into production with autogenerated code, stanβ¦β53Updated 2 years ago
- PySpark Tutorial for Beginners on Google Colab: Hands-On Guideβ16Updated 4 years ago
- The fast.ai data ethics courseβ15Updated 2 years ago
- An Implementation of ERNIE For Language Understanding (including Pre-training models and Fine-tuning tools)β27Updated 5 years ago
- Spark NLP for Streamlitβ15Updated 3 years ago
- Confusion Matrix in Python: plot a pretty confusion matrix (like Matlab) in python using seaborn and matplotlibβ19Updated 3 years ago
- This repo is an approach to TDD in machine learning model operation. it covers project structure, testing essentials using pytest with Giβ¦β15Updated 4 years ago
- β19Updated 4 years ago
- Best practices for engineering ML pipelines.β35Updated 2 years ago
- This repository is to host template for calculating ROI on Artificial Intelligence projectsβ44Updated 5 years ago
- This is a repository for the Duke University Cloud Computing course project on Serveless Data Engineering Pipeline. For this project, I rβ¦β19Updated 4 years ago
- β26Updated 4 years ago
- Demo on how to use Prefect with Dockerβ25Updated 2 years ago
- MinHash implementation in Pythonβ11Updated 7 months ago
- A scikit-learn compatible estimator based on business-rules with interactive dashboard includedβ28Updated 3 years ago
- Machine Learning Projects with Flytekitβ36Updated last year
- Slides and notebook for the workshop on serving bert models in productionβ25Updated 2 years ago