ecloudvalley / Building-a-Data-Lake-with-AWS-Glue-and-Amazon-S3Links
☆17Updated 6 years ago
Alternatives and similar repositories for Building-a-Data-Lake-with-AWS-Glue-and-Amazon-S3
Users that are interested in Building-a-Data-Lake-with-AWS-Glue-and-Amazon-S3 are comparing it to the libraries listed below
Sorting:
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 9 years ago
- AWS Big Data Certification☆25Updated 6 months ago
- Ingest tweets with Kafka. Use Spark to track popular hashtags and trendsetters for each hashtag☆29Updated 9 years ago
- Datasets for CS109☆28Updated 11 years ago
- Spark and Python (PySpark) Examples☆39Updated 4 years ago
- Tutorial repo for the article "ML in Production"☆30Updated 2 years ago
- 🚨 Simple, self-contained fraud detection system built with Apache Kafka and Python☆88Updated 6 years ago
- [Book-2019] Pragmatic AI: An Introduction to Cloud-based Machine Learning☆137Updated 6 months ago
- Just a boilerplate for PySpark and Flask☆35Updated 7 years ago
- ☆15Updated 7 years ago
- A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆26Updated 5 years ago
- A solution enabling customers to quickly deploy an architecture to identify and mask sensitive health data☆26Updated 2 years ago
- Sentiment Analysis of a Twitter Topic with Spark Structured Streaming☆55Updated 6 years ago
- All the code related to building my own data lake☆21Updated 2 years ago
- Build a recommendation engine with Spark and Watson Machine Learning☆46Updated 5 years ago
- An extension for Jupyter notebooks that allows running notebooks inside a Docker container and converting them to runnable Docker images.☆28Updated last year
- PySpark phonetic and string matching algorithms☆39Updated last year
- Build and deploy a serverless data pipeline on AWS with no effort.☆111Updated 2 years ago
- Mastering Spark for Data Science, published by Packt☆47Updated 2 years ago
- Big Data Demystified meetup and blog examples☆31Updated 11 months ago
- Model management example using Polyaxon, Argo and Seldon☆23Updated 6 years ago
- Basic tutorial of using Apache Airflow☆36Updated 6 years ago
- A Scalable Data Cleaning Library for PySpark.☆29Updated 6 years ago
- How to use Python to understand data and transform the data into a tidy format ready to be used for modelling and visualisation.☆36Updated 6 years ago
- ☆10Updated 6 years ago
- Open innovation with 60 minute cloud experiments on AWS☆88Updated last year
- An example PySpark project with pytest☆16Updated 7 years ago
- ⭕️ Minimum Viable Machine Learning☆33Updated 4 years ago
- Code to solve a open dataset of predictive maintanance of sheet brek on a paper mill.☆8Updated 4 years ago
- Code examples for the Introduction to Kubeflow course☆14Updated 4 years ago