chuqiaoshen / Git-Influencer
Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Network.
☆16Updated 11 months ago
Alternatives and similar repositories for Git-Influencer:
Users that are interested in Git-Influencer are comparing it to the libraries listed below
- PySpark phonetic and string matching algorithms☆39Updated last year
- ☆18Updated last week
- Text similarity based on Word2Vec vectors.☆11Updated 8 years ago
- Sentiment Analysis of a Twitter Topic with Spark Structured Streaming☆55Updated 6 years ago
- 🚨 Simple, self-contained fraud detection system built with Apache Kafka and Python☆86Updated 5 years ago
- Blog post on ETL pipelines with Airflow☆23Updated 4 years ago
- Repository fo Data Engineering Course☆57Updated last year
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 4 years ago
- AWS Big Data Certification☆25Updated 3 months ago
- Public repository for the Search Fundamentals course taught by Daniel Tunkelang and Grant Ingersoll. Available at https://corise.com/cour…☆42Updated last year
- ☆12Updated 3 years ago
- Apache Spark Interview Question and Answers☆20Updated 4 years ago
- Analytics on Apache Projects for Diversity☆18Updated 5 years ago
- Data engineering interviews Q&A for data community by data community☆63Updated 4 years ago
- Sharing interesting and noteworthy Data Engineering content☆67Updated 8 years ago
- Big Data Demystified meetup and blog examples☆31Updated 8 months ago
- Basic tutorial of using Apache Airflow☆36Updated 6 years ago
- Various data stream/batch process demo with Apache Scala Spark 🚀☆11Updated 5 years ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 8 years ago
- Sample code for building an end-to-end instant search solution☆39Updated last year
- Challenge for those applying to the Software Engineer, Big Data position☆35Updated 13 years ago
- Deploy an IMDB sentiment analysis model using kubernetes☆13Updated 2 years ago
- a data science blog☆15Updated last year
- 🐍💨 Airflow tutorial for PyCon 2019☆86Updated 2 years ago
- Build a recommendation engine with Spark and Watson Machine Learning☆46Updated 5 years ago
- Analytics for building Customer Journey Map in Ecommerce☆28Updated 5 years ago
- Spark data pipeline that processes movie ratings data.☆28Updated 3 weeks ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆54Updated last year
- Code to build a simple analytics data pipeline with Python☆102Updated 8 years ago
- ☆15Updated 2 years ago