chuqiaoshen / Git-Influencer
Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Network.
☆17Updated 8 months ago
Alternatives and similar repositories for Git-Influencer:
Users that are interested in Git-Influencer are comparing it to the libraries listed below
- Sharing interesting and noteworthy Data Engineering content☆65Updated 8 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc☆51Updated 8 years ago
- Udacity Data Pipeline Exercises☆15Updated 4 years ago
- Ingest tweets with Kafka. Use Spark to track popular hashtags and trendsetters for each hashtag☆29Updated 8 years ago
- The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on…☆26Updated 2 years ago
- Challenge for those applying to the Software Engineer, Big Data position☆34Updated 13 years ago
- AWS Big Data Certification☆25Updated last week
- Public repository for the Search Fundamentals course taught by Daniel Tunkelang and Grant Ingersoll. Available at https://corise.com/cour…☆41Updated last year
- Apache Spark Interview Question and Answers☆21Updated 4 years ago
- Building Json data pipeline within Snowflake using Streams and Tasks☆26Updated 5 years ago
- Sentiment Analysis of a Twitter Topic with Spark Structured Streaming☆55Updated 6 years ago
- Labs and data files for a full-day Spark workshop☆24Updated last year
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.☆29Updated last year
- 🚨 Simple, self-contained fraud detection system built with Apache Kafka and Python☆83Updated 5 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- How to build an awesome data engineering team☆99Updated 5 years ago
- An ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables☆12Updated 4 years ago
- #DataPipeLine #ETL - Created is a Facebook data extraction utility to extract the publicly available data on Facebook. Used Facebook Grap…☆14Updated 6 years ago
- Code to build a simple analytics data pipeline with Python☆102Updated 7 years ago
- Processing tweets using Spark Streaming and identifying top trending hashtags using a real-time simple dashboard☆41Updated 2 years ago
- How to use Python to understand data and transform the data into a tidy format ready to be used for modelling and visualisation.☆37Updated 5 years ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆47Updated 2 weeks ago
- Sample Airflow DAGs to load data from the CovidTracking API to Snowflake via an AWS S3 intermediary.☆16Updated 4 years ago
- Data pipeline project☆25Updated last year
- Using Luigi to create a Machine Learning Pipeline using the Rossman Sales data from Kaggle☆33Updated 8 years ago
- Use Kafka and Apache Spark streaming to perform click stream analytics☆76Updated 4 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆132Updated 4 years ago
- Big Data Demystified meetup and blog examples☆31Updated 5 months ago
- Data engineering interviews Q&A for data community by data community☆62Updated 4 years ago