The source code for the book Modern Data Engineering with Apache Spark
☆40Jul 26, 2022Updated 3 years ago
Alternatives and similar repositories for spark-moderndataengineering
Users that are interested in spark-moderndataengineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Visits sessionization pipeline used for the talk☆13May 28, 2024Updated last year
- A series of workshop modules introducing Feast feature store.☆18May 31, 2022Updated 3 years ago
- Source Code for 'Beginning Apache Spark 3' by Hien Luu☆13Oct 14, 2021Updated 4 years ago
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.☆21Jun 12, 2024Updated last year
- Using WASM to write UDFs in Apache Spark☆12Jun 3, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- High Performance with Java, published by Packt☆15Jul 18, 2024Updated last year
- Streamlit Cookbook, published by Packt☆14Jun 6, 2025Updated 11 months ago
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆10Feb 2, 2024Updated 2 years ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Mar 2, 2026Updated 2 months ago
- Digital Transformation and Modernization with IBM API Connect, published by Packt☆12Jan 30, 2023Updated 3 years ago
- Git repo to accompany the AWS DevOps Blog: Using AWS DevOps Tools to model and provision AWS Glue workflows☆19Nov 16, 2021Updated 4 years ago
- Discover Bluemix, IBM Cloud Platform, through a set of hands-on labs.☆12Feb 13, 2024Updated 2 years ago
- Resources for the book "Functional and Concurrent Programming"☆19Jan 16, 2026Updated 3 months ago
- Artificial Intelligence for Big Data, published by Packt☆17Mar 2, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- sheriferson's dot, config, and setup files☆14Dec 13, 2025Updated 4 months ago
- GitHub Repository for Azure AI-102 Essentials to Learn, Implement, and Certify☆33Feb 11, 2026Updated 2 months ago
- BSR's new public API. Currently in development.☆21Jan 26, 2026Updated 3 months ago
- ☆53Jan 28, 2026Updated 3 months ago
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆27Mar 17, 2026Updated last month
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆95May 9, 2025Updated 11 months ago
- ☆16Oct 21, 2024Updated last year
- Materials for Mike's PyCon Canada 2016 PySpark Tutorial☆12Nov 13, 2016Updated 9 years ago
- Big Data infrastructure with Hadoop, Spark, Hive and NiFi deployed using Docker Compose. https://doi.org/10.5281/zenodo.18968438☆21Mar 11, 2026Updated last month
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Create and manage organizations, users, apps, products and APIs with these scripts☆19Jan 20, 2023Updated 3 years ago
- ☆113Jan 15, 2025Updated last year
- Plugin for Intake to read from SQL servers☆15May 29, 2023Updated 2 years ago
- ELT Data Pipeline implementation in Data Warehousing environment☆30May 2, 2025Updated last year
- Managing Data as a Product, published by Packt☆21Nov 30, 2024Updated last year
- A Flat Data GitHub Action demo repo☆15Jan 1, 2024Updated 2 years ago
- Rest API for Todobackend on top of Cassandra☆26Feb 22, 2023Updated 3 years ago
- Generate Parquet Files☆14Apr 23, 2026Updated last week
- ☆13Feb 19, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- End to end data pipeline☆22Apr 13, 2025Updated last year
- Helm Chart for deploying Spark history server in Amazon EKS for S3 Spark Event Logs☆29Apr 4, 2026Updated last month
- A Model Context Protocol server for Google Workspace integration (Gmail and Calendar)☆30Dec 29, 2024Updated last year
- This repo is for the Linkedin Learning course: Learning Neo4j☆26Jun 13, 2023Updated 2 years ago
- This repo provides the Kubernetes Helm chart for deploying Pyspark Notebook.☆17Nov 16, 2022Updated 3 years ago
- Spark in Action, 2nd edition - chapter 16 - performance, checkpointing, and caching☆12Apr 21, 2023Updated 3 years ago
- Spark in Action, 2nd edition - chapter 12 - Transforming your data☆11Feb 6, 2024Updated 2 years ago