yennanliu / knowledge_base_repo
Resources for software/backend/data learning | #SE | #DE | #DS
β16Updated last month
Related projects β
Alternatives and complementary repositories for knowledge_base_repo
- Various data stream/batch process demo with Apache Scala Spark πβ11Updated 4 years ago
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.β29Updated last year
- Interactive Notebooks that support the bookβ38Updated 4 years ago
- A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0β25Updated 3 years ago
- Spark and Python (PySpark) Examplesβ39Updated 3 years ago
- Sentiment Analysis of a Twitter Topic with Spark Structured Streamingβ55Updated 5 years ago
- The official repository for the Rock the JVM Spark Optimization 2 courseβ37Updated 11 months ago
- [ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examplesβ70Updated 4 years ago
- Mastering Spark for Data Science, published by Packtβ46Updated last year
- Because its never late to start taking notes and 'public' it...β60Updated 3 weeks ago
- β19Updated 6 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatioβ¦β53Updated last year
- β30Updated 5 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystemβ86Updated 5 years ago
- Real-world Spark pipelines examplesβ83Updated 6 years ago
- β38Updated 6 years ago
- The official repository for the Rock the JVM Spark Optimization with Scala courseβ55Updated 11 months ago
- An example PySpark project with pytestβ17Updated 7 years ago
- Demonstration code for MLeap, both Jupyter notebooks and projectsβ24Updated 5 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,β¦β89Updated 2 years ago
- How to manage Slowly Changing Dimensions with Apache Hiveβ55Updated 5 years ago
- Batch Processing , orchestration using Apache Airflow and Google Workflows, spark structured Streaming and a lot moreβ19Updated 2 years ago
- A repo to track data engineering projectsβ13Updated 2 years ago
- β37Updated 8 years ago
- docs, codes and resources to prepare for the CRT020: Databricks Certified Associate Developer for Apache Spark 2.4 with Python 3 certificβ¦β9Updated 5 years ago
- Apache Spark Interview Question and Answersβ21Updated 4 years ago
- Finance π¦ Data Builder π οΈ @ postgres πβ18Updated 3 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Sparkβ15Updated 9 months ago
- Repo for all my code on the articles I post on mediumβ105Updated 2 years ago