Example repo to create end to end tests for data pipeline.
☆25Jun 14, 2024Updated last year
Alternatives and similar repositories for e2e_datapipeline_test
Users that are interested in e2e_datapipeline_test are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Apr 26, 2024Updated 2 years ago
- End to end data engineering project☆58Oct 27, 2022Updated 3 years ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆13May 24, 2024Updated last year
- Repository for Data Engineering Interview Series☆37Oct 17, 2024Updated last year
- Code to demonstrate data engineering metadata & logging best practices☆21Mar 12, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.☆15Jun 26, 2023Updated 2 years ago
- A custom end-to-end analytics platform for customer churn☆11May 15, 2025Updated 11 months ago
- Project for "Data pipeline design patterns" blog.☆51Aug 6, 2024Updated last year
- Apache Spark using SQL☆14Aug 18, 2021Updated 4 years ago
- Simple stream processing pipeline☆112Jun 17, 2024Updated last year
- Near real time ETL to populate a dashboard.☆75Sep 9, 2025Updated 7 months ago
- Step by step instructions to create a production-ready data pipeline☆60Dec 23, 2024Updated last year
- Docker image for Spark history server on Kubernetes☆15Mar 13, 2020Updated 6 years ago
- Slowly Changing Dimension Type 2 (scd2) custom materialization☆11Apr 6, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A simple Data Engineering solution for testing or education purposes. You only need to know SQL and Python to understand this project. Da…☆29Jul 2, 2022Updated 3 years ago
- Generate Python data structures and XML parser from Xschema (Python 3 port)☆12Jan 13, 2015Updated 11 years ago
- Repo for CDC with debezium blog post☆29Sep 15, 2024Updated last year
- The Demo for Blog: Modularization using Python and Docker (MicroService)☆12Feb 4, 2021Updated 5 years ago
- Tarot widget for website☆12Jan 6, 2023Updated 3 years ago
- ☆10Jan 28, 2025Updated last year
- ☆13Sep 25, 2024Updated last year
- Different ways to connect to storage in Azure Databricks☆11Jul 19, 2019Updated 6 years ago
- Beginner data engineering project - batch edition☆581Apr 13, 2026Updated 3 weeks ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Primary repository for NYC DCP's Data Engineering team☆39Updated this week
- Full stack data engineering tools and infrastructure set-up☆58Feb 13, 2021Updated 5 years ago
- ☆32Aug 13, 2018Updated 7 years ago
- ☆10Oct 20, 2022Updated 3 years ago
- Building Event Driven Application with AWS Lambda and Amazon Redshift Data API☆17Oct 27, 2020Updated 5 years ago
- Matching messy Pandas columns with FuzzyWuzzy (Medium Article)☆13Sep 29, 2019Updated 6 years ago
- Code for blog at: https://www.startdataengineering.com/post/docker-for-de/☆40Apr 29, 2024Updated 2 years ago
- A ready to use template for the CRISP-DM data science workflow☆14Apr 16, 2026Updated 2 weeks ago
- Understanding of POS tags and build a POS tagger from scratch☆11Jun 9, 2018Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Your Top Spotify Listening Habits, Favorite Artists, and Song Recommendations in a Playlist🎧🎶☆19May 19, 2025Updated 11 months ago
- ☆23Jul 8, 2025Updated 9 months ago
- Cost Efficient Data Pipelines with DuckDB☆63May 14, 2025Updated 11 months ago
- Birgitta is a Python ETL test and schema framework, providing automated tests for pyspark notebooks/recipes.☆14Nov 9, 2023Updated 2 years ago
- Create a new Date, accepting more input types than normal, like Unix timestamps.☆14Jul 18, 2023Updated 2 years ago
- An Airflow pipeline for the collection of historical Twitter data☆10Aug 5, 2019Updated 6 years ago
- ☆12Apr 30, 2024Updated 2 years ago