chuqiaoshen / Git-InfluencerLinks
Insight Data Engineering project: A platform built in HDFS, Spark and Airflow to help you to find social influencers from GitHub Network.
β16Updated last year
Alternatives and similar repositories for Git-Influencer
Users that are interested in Git-Influencer are comparing it to the libraries listed below
Sorting:
- Sentiment Analysis of a Twitter Topic with Spark Structured Streamingβ55Updated 7 years ago
- π¨ Simple, self-contained fraud detection system built with Apache Kafka and Pythonβ89Updated 6 years ago
- Jupyter notebooks for pyspark tutorials given at Universityβ110Updated last week
- Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with β¦β53Updated 11 months ago
- Basic tutorial of using Apache Airflowβ36Updated 7 years ago
- Challenge for those applying to the Software Engineer, Big Data positionβ35Updated 14 years ago
- Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.β126Updated 4 years ago
- β151Updated 7 years ago
- How to build an awesome data engineering teamβ101Updated 6 years ago
- Udacity Data Pipeline Exercisesβ15Updated 5 years ago
- PySpark phonetic and string matching algorithmsβ39Updated last year
- Example custom model image trainable and distributable via AWS SageMakerβ35Updated last month
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....β77Updated this week
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.β32Updated 2 years ago
- Public source code for the Batch Processing with Apache Beam (Python) online courseβ18Updated 5 years ago
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etcβ51Updated 9 years ago
- This project is created to promote and advocate the use of FOSS machine learning.β47Updated 7 months ago
- Still in Betaβ17Updated 4 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatioβ¦β55Updated 2 years ago
- AWS Big Data Certificationβ25Updated 11 months ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,β¦β89Updated 4 years ago
- ππ¨ Airflow tutorial for PyCon 2019β87Updated 3 years ago
- β23Updated 4 years ago
- Use Kafka and Apache Spark streaming to perform click stream analyticsβ76Updated 5 years ago
- JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.β31Updated 3 years ago
- Databases: Concepts, commands, codes, interview questions and more...β57Updated 3 years ago
- Repo for all my code on the articles I post on mediumβ107Updated 3 years ago
- Big Data Demystified meetup and blog examplesβ31Updated last year
- Graph databases, Knowledge Graphs, SPARQβ82Updated 4 years ago
- Sharing interesting and noteworthy Data Engineering contentβ70Updated 9 years ago