RWaltersMA / mongo-spark-jupyter
Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.
☆40Updated 2 years ago
Alternatives and similar repositories for mongo-spark-jupyter:
Users that are interested in mongo-spark-jupyter are comparing it to the libraries listed below
- A Series of Notebooks on how to start with Kafka and Python☆154Updated last year
- Data lake, data warehouse on GCP☆55Updated 3 years ago
- End-to-end ELT data engineering project☆20Updated 2 years ago
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆59Updated last year
- Repo that relates to the Medium blog 'Keeping your ML model in shape with Kafka, Airflow' and MLFlow'☆119Updated last year
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆39Updated 3 years ago
- Productionalizing Data Pipelines with Apache Airflow☆111Updated 2 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆90Updated 3 years ago
- A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract d…☆24Updated 3 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Updated last year
- 🚨 Simple, self-contained fraud detection system built with Apache Kafka and Python☆84Updated 5 years ago
- This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…☆76Updated 5 years ago
- Kafka variant of the MLOps Level 1 stack☆24Updated 2 years ago
- Airflow helm chart for AWS EKS☆18Updated 4 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆102Updated 4 years ago
- (project & tutorial) dag pipeline tests + ci/cd setup☆86Updated 4 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆48Updated 2 years ago
- Materials for the next course☆24Updated 2 years ago
- Spark, Airflow, Kafka☆26Updated last year
- Here I will be exploring various tools and methods that are used in data engineering process with Python.☆22Updated 4 years ago
- ☆28Updated last year
- A repository of sample code to show data quality checking best practices using Airflow.☆74Updated last year
- Docker Airflow - Contains a docker compose file for Airflow 2.0☆65Updated 2 years ago
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆82Updated 9 months ago
- Simple stream processing pipeline☆98Updated 8 months ago
- Bunch of Airflow Configurations and DAGs for Kubernetes, Spark based data-pipelines. Scale inside Kubernetes using spark kubernetes maste…☆23Updated 3 years ago
- How to build and deploy an anonymization API with FastAPI and SpaCy☆70Updated 3 years ago
- Full stack data engineering tools and infrastructure set-up☆49Updated 4 years ago
- ☆63Updated this week
- Spark data pipeline that processes movie ratings data.☆28Updated 3 weeks ago