The source code for the book Modern Data Engineering with Apache Spark
☆40Jul 26, 2022Updated 3 years ago
Alternatives and similar repositories for spark-moderndataengineering
Users that are interested in spark-moderndataengineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A series of workshop modules introducing Feast feature store.☆18May 31, 2022Updated 3 years ago
- Don't Panic. This guide will help you when it feels like the end of the world.☆30Feb 7, 2026Updated 2 months ago
- Data Streaming with Debezium, Kafka, Spark Streaming, Delta Lake, and MinIO☆15May 15, 2024Updated last year
- Source Code for 'Beginning Apache Spark 3' by Hien Luu☆13Oct 14, 2021Updated 4 years ago
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.☆21Jun 12, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Examples of spark-lucenerdd☆15Oct 6, 2023Updated 2 years ago
- Model Context Protocol (MCP) server to interact with gRPC services using the grpcurl tool☆16Mar 5, 2025Updated last year
- Using WASM to write UDFs in Apache Spark☆12Jun 3, 2024Updated last year
- High Performance with Java, published by Packt☆15Jul 18, 2024Updated last year
- ☆18Nov 2, 2023Updated 2 years ago
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆10Feb 2, 2024Updated 2 years ago
- BigData Course☆13Apr 7, 2026Updated last week
- Discover Bluemix, IBM Cloud Platform, through a set of hands-on labs.☆12Feb 13, 2024Updated 2 years ago
- Artificial Intelligence for Big Data, published by Packt☆17Mar 2, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- GitHub Repository for Azure AI-102 Essentials to Learn, Implement, and Certify☆33Feb 11, 2026Updated 2 months ago
- Collection of code snippets for blogs, conferences, and talks☆24Nov 1, 2022Updated 3 years ago
- ☆53Jan 28, 2026Updated 2 months ago
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆27Mar 17, 2026Updated 3 weeks ago
- ☆22Mar 11, 2025Updated last year
- Code examples for functional programming☆21Mar 3, 2025Updated last year
- Materials for Mike's PyCon Canada 2016 PySpark Tutorial☆12Nov 13, 2016Updated 9 years ago
- Big Data infrastructure with Hadoop, Spark, Hive and NiFi deployed using Docker Compose. https://doi.org/10.5281/zenodo.18968438☆21Mar 11, 2026Updated last month
- Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia☆19Feb 3, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆12Jul 1, 2025Updated 9 months ago
- Managing Data as a Product, published by Packt☆20Nov 30, 2024Updated last year
- A Flat Data GitHub Action demo repo☆15Jan 1, 2024Updated 2 years ago
- ISS Tracker for the Cardputer Adv☆38Jan 19, 2026Updated 2 months ago
- Generate Parquet Files☆14Updated this week
- ☆12Mar 22, 2018Updated 8 years ago
- Hands-on Labs (HOLs) and presentations for Microservices, Serverless and Containers readiness.☆13Dec 2, 2017Updated 8 years ago
- Trino (f.k.a PrestoSQL) dialect for SQLAlchemy.☆25May 5, 2022Updated 3 years ago
- Examples of using Neo4j with R.☆22Jan 18, 2016Updated 10 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Official Dockerfile for Delta Lake☆61Feb 24, 2026Updated last month
- End to end data pipeline☆22Apr 13, 2025Updated last year
- I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that uti…☆29May 2, 2023Updated 2 years ago
- Statically analyze sources and extract information about called or exported library functions in Python applications☆21Apr 25, 2024Updated last year
- Curso de Hibernate y JPA☆17May 25, 2017Updated 8 years ago
- ☆11Oct 6, 2023Updated 2 years ago
- Helm Chart for deploying Spark history server in Amazon EKS for S3 Spark Event Logs☆29Apr 4, 2026Updated last week