chuqiaoshen / Git-Influencer
Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Network.
☆16Updated 10 months ago
Alternatives and similar repositories for Git-Influencer:
Users that are interested in Git-Influencer are comparing it to the libraries listed below
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Udacity Data Pipeline Exercises☆15Updated 4 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆134Updated 4 years ago
- Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database☆13Updated 3 years ago
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc☆51Updated 8 years ago
- Processing tweets using Spark Streaming and identifying top trending hashtags using a real-time simple dashboard☆41Updated 2 years ago
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.☆29Updated last year
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆90Updated 3 years ago
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 4 years ago
- 🚨 Simple, self-contained fraud detection system built with Apache Kafka and Python☆86Updated 5 years ago
- Basic tutorial of using Apache Airflow☆36Updated 6 years ago
- Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups☆16Updated 6 years ago
- Ingest tweets with Kafka. Use Spark to track popular hashtags and trendsetters for each hashtag☆29Updated 8 years ago
- Apache Spark Interview Question and Answers☆20Updated 4 years ago
- Use Kafka and Apache Spark streaming to perform click stream analytics☆76Updated 5 years ago
- 🐍💨 Airflow tutorial for PyCon 2019☆85Updated 2 years ago
- Jupyter notebooks for pyspark tutorials given at University☆107Updated 3 months ago
- Big Data Demystified meetup and blog examples☆31Updated 7 months ago
- Challenge for those applying to the Software Engineer, Big Data position☆34Updated 13 years ago
- Using Luigi to create a Machine Learning Pipeline using the Rossman Sales data from Kaggle☆33Updated 8 years ago
- ☆17Updated 6 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- Build an scikit-learn model to predict churn using customer telco data.☆15Updated 3 months ago
- ☆148Updated 6 years ago
- Example of an ETL Pipeline using Airflow☆34Updated 7 years ago
- Code to build a simple analytics data pipeline with Python☆102Updated 8 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆87Updated 6 years ago
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆16Updated 5 years ago
- ☆16Updated last year
- Data engineering interviews Q&A for data community by data community☆63Updated 4 years ago