manuparra / starting-bigdata-aws
☆24Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for starting-bigdata-aws
- Real-world Spark pipelines examples☆83Updated 6 years ago
- Fuzzy matching function in spark (https://spark-packages.org/package/itspawanbhardwaj/spark-fuzzy-matching)☆23Updated 4 years ago
- ☆10Updated 2 years ago
- Conversion utility from Zeppelin notes to Jupyter notebooks.☆44Updated 4 years ago
- WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging …☆30Updated 4 months ago
- Spark NLP for Streamlit☆15Updated 3 years ago
- Sample processing code using Spark 2.1+ and Scala☆51Updated 4 years ago
- The iterative broadcast join example code.☆69Updated 7 years ago
- Labs and data files for a full-day Spark workshop☆24Updated last year
- Simple machine learning in Python/Tensorflow with model saving☆14Updated 7 years ago
- Blog post on ETL pipelines with Airflow☆23Updated 4 years ago
- Example of a scalable IoT data processing pipeline setup using Databricks☆31Updated 3 years ago
- Code examples supporting the "Introduction to Apache Spark" video published by O'Reilly Media☆37Updated 2 years ago
- InsightEdge Core☆20Updated 7 months ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆15Updated 10 months ago
- Code and setup information for Introduction to Machine Learning with Spark☆12Updated 9 years ago
- Data Exploration Using Spark 2.0☆14Updated 6 years ago
- ☆16Updated last year
- Testing Scala code with scalatest☆11Updated 2 years ago
- ☆11Updated 8 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Updated 2 months ago
- PyConDE & PyData Berlin 2019 Airflow Workshop: Airflow for machine learning pipelines.☆46Updated last year
- Repository used for Spark Trainings☆53Updated last year
- Ingest tweets with Kafka. Use Spark to track popular hashtags and trendsetters for each hashtag☆29Updated 8 years ago
- My MSc on Data Science final project. This is a library for Data Pre-processing Algorithms for Streaming in Flink (DPASF)☆18Updated 5 years ago
- Tweet Analysis with Spark☆15Updated 7 years ago