jacopotagliabue / paas-data-ingestion
Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner
☆69Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for paas-data-ingestion
- Joining the modern data stack with the modern ML stack☆193Updated last year
- Playground for using large language models into the Modern Data Stack for entity matching☆106Updated last year
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- A PaaS End-to-End ML Setup with Metaflow, Serverless and SageMaker.☆37Updated 3 years ago
- Demo repository to lambda-fy your dbt runs☆11Updated last year
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆55Updated last year
- Materials for my 2021 NYU class on NLP and ML Systems (Master of Engineering).☆96Updated last year
- Kedro Plugin to support running workflows on Kubeflow Pipelines☆53Updated 2 months ago
- Recommendations at "Reasonable Scale": joining dataOps with recSys through dbt, Merlin and Metaflow☆230Updated last year
- Example repository showing how to build a data platform with Prefect, dbt and Snowflake☆95Updated last year
- A dbt package for doing product analytics☆84Updated 2 years ago
- Repo for orienting dbt users to the Dagster asset framework☆50Updated 2 years ago
- A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.☆180Updated last year
- Define, govern, and model event data for warehouse-first product analytics.☆82Updated 4 months ago
- Food for thoughts around data contracts☆24Updated this week
- Example Dagster Cloud code for the Hooli Data Engineering organization.☆76Updated last week
- Demo of Streamlit application with Databricks SQL Endpoint☆33Updated 2 years ago
- A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.☆80Updated 6 months ago
- Build your feature store with macros right within your dbt repository☆37Updated last year
- Delta Lake helper methods. No Spark dependency.☆22Updated 2 months ago
- IbisML is a library for building scalable ML pipelines using Ibis.☆95Updated last month
- A dbt-Core package for generating models from an activity stream.☆39Updated 7 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆111Updated 7 months ago
- Open Data Stack Projects: Examples of End to End Data Engineering Projects☆71Updated last year
- Parse dbt artifacts and search dbt models with Algolia☆52Updated 3 years ago
- A curated list of awesome blogs, videos, tools and resources about Data Contracts☆166Updated 3 months ago
- Cost Efficient Data Pipelines with DuckDB☆45Updated 3 months ago
- A guide for leading a data (engineering) team☆60Updated 6 months ago
- Example repo to kickstart integration with mlflow pipelines.☆73Updated 2 years ago