The source code for the book Modern Data Engineering with Apache Spark
☆39Jul 26, 2022Updated 3 years ago
Alternatives and similar repositories for spark-moderndataengineering
Users that are interested in spark-moderndataengineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Visits sessionization pipeline used for the talk☆13May 28, 2024Updated last year
- A series of workshop modules introducing Feast feature store.☆19May 31, 2022Updated 3 years ago
- Don't Panic. This guide will help you when it feels like the end of the world.☆30Feb 7, 2026Updated last month
- Source Code for 'Beginning Apache Spark 3' by Hien Luu☆13Oct 14, 2021Updated 4 years ago
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.☆21Jun 12, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Examples of spark-lucenerdd☆15Oct 6, 2023Updated 2 years ago
- Model Context Protocol (MCP) server to interact with gRPC services using the grpcurl tool☆16Mar 5, 2025Updated last year
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆10Feb 2, 2024Updated 2 years ago
- Digital Transformation and Modernization with IBM API Connect, published by Packt☆12Jan 30, 2023Updated 3 years ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Mar 2, 2026Updated 3 weeks ago
- Git repo to accompany the AWS DevOps Blog: Using AWS DevOps Tools to model and provision AWS Glue workflows☆19Nov 16, 2021Updated 4 years ago
- ☆21Aug 31, 2025Updated 6 months ago
- Artificial Intelligence for Big Data, published by Packt☆17Mar 2, 2026Updated 3 weeks ago
- GitHub Repository for Azure AI-102 Essentials to Learn, Implement, and Certify☆32Feb 11, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Covid19 and Iowa Liquor Sales analysis at BigQuery using dbt, Airflow, Marquez, Google Cloud and other modern data stack tools☆14Jun 18, 2022Updated 3 years ago
- ☆51Jan 28, 2026Updated last month
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆27Mar 17, 2026Updated last week
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆95May 9, 2025Updated 10 months ago
- ☆16Oct 21, 2024Updated last year
- Profiling Spark Applications for Performance Comparison and Diagnosis☆17Nov 11, 2018Updated 7 years ago
- ☆113Jan 15, 2025Updated last year
- Plugin for Intake to read from SQL servers☆15May 29, 2023Updated 2 years ago
- Managing Data as a Product, published by Packt☆20Nov 30, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Flat Data GitHub Action demo repo☆15Jan 1, 2024Updated 2 years ago
- ISS Tracker for the Cardputer Adv☆37Jan 19, 2026Updated 2 months ago
- Generate Parquet Files☆14Mar 16, 2026Updated last week
- Hands-on Labs (HOLs) and presentations for Microservices, Serverless and Containers readiness.☆13Dec 2, 2017Updated 8 years ago
- ☆13Feb 19, 2025Updated last year
- End to end data pipeline☆22Apr 13, 2025Updated 11 months ago
- ☆11Oct 6, 2023Updated 2 years ago
- Helm Chart for deploying Spark history server in Amazon EKS for S3 Spark Event Logs☆29Feb 9, 2026Updated last month
- Mock streaming data generator☆18May 31, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Plane moment analysis with Apache Flink complex event processing☆17Jun 14, 2025Updated 9 months ago
- ☆14Feb 23, 2021Updated 5 years ago
- Spark in Action, 2nd edition - chapter 16 - performance, checkpointing, and caching☆12Apr 21, 2023Updated 2 years ago
- Auto-mirror of scoopinstaller/scoop-main bucket☆12Mar 19, 2026Updated last week
- ☆22Feb 7, 2024Updated 2 years ago
- Events about the open source data stack☆13Apr 16, 2022Updated 3 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Jul 13, 2022Updated 3 years ago