liliasfaxi / Atelier-SparkLinks
Cours et TP sur Apache Spark
☆11Updated 3 years ago
Alternatives and similar repositories for Atelier-Spark
Users that are interested in Atelier-Spark are comparing it to the libraries listed below
Sorting:
- Build & Learn Data Engineering,Machine Learning over Kubernetes. No Shortcut approach.☆57Updated 2 years ago
 - The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Pos…☆74Updated 2 years ago
 - ☆19Updated last year
 - Repository for all ITVersity Vagrant Boxes.☆32Updated 5 years ago
 - EverythingApacheNiFi☆115Updated 2 years ago
 - Code base for airflow training series Getting easy with Apache Airflow☆41Updated 2 years ago
 - ☆88Updated 3 years ago
 - Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆64Updated 2 years ago
 - ☆32Updated 4 years ago
 - Scraping my school's alumni Data from LinkedIn using a bot 🤖☆25Updated 4 years ago
 - A Series of Notebooks on how to start with Kafka and Python☆152Updated 8 months ago
 - This contain how to install Hadoop on google colab and how to run map-reduce in Hadoop☆33Updated 5 years ago
 - This project shows how to capture changes from postgres database and stream them into kafka☆38Updated last year
 - ☆27Updated last year
 - ☆24Updated 2 years ago
 - Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆48Updated last year
 - Dockerizing an Apache Spark Standalone Cluster☆43Updated 3 years ago
 - Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆134Updated 2 years ago
 - Spark all the ETL Pipelines☆35Updated 2 years ago
 - A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆30Updated 5 years ago
 - Road to Azure Data Engineer Part-II: DP-201 - Designing an Azure Data Solution☆19Updated 5 years ago
 - Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collab…☆39Updated 5 years ago
 - Apache Spark 3 - Structured Streaming Course Material☆124Updated 2 years ago
 - An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆287Updated 8 months ago
 - ☆25Updated last year
 - Glue ETL job or EMR Spark that gets from data catalog, modifies and uploads to S3 and Data Catalog☆13Updated 2 years ago
 - Django-based course management platform for Zoomcamps☆73Updated 2 weeks ago
 - PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Updated 4 years ago
 - Materials for the next course☆25Updated 2 years ago
 - Spark data pipeline that processes movie ratings data.☆30Updated last month