vincentclaes / datajob
Build and deploy a serverless data pipeline on AWS with no effort.
☆111Updated 2 years ago
Alternatives and similar repositories for datajob:
Users that are interested in datajob are comparing it to the libraries listed below
- Example templates for the delivery of custom ML solutions to production so you can get started quickly without having to make too many de…☆71Updated 10 months ago
- Demo for GitHub Universe 2022☆12Updated 2 years ago
- ☆60Updated 3 years ago
- This repository contains the dbt-glue adapter☆116Updated 2 weeks ago
- Glue VSCode devcontainer setup☆14Updated 2 years ago
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆84Updated 2 years ago
- Build DataOps platform with Apache Airflow and dbt on AWS☆55Updated 3 years ago
- ☆88Updated last year
- A VS Code Extension to make it easier to manage and develop Spark jobs on EMR☆36Updated 2 months ago
- ☆73Updated 11 months ago
- PySpark data-pipeline testing and CICD☆28Updated 4 years ago
- This repository shows a sample example to build, manage and orchestrate Machine Learning workflows using Amazon Sagemaker and Apache Airf…☆136Updated 3 years ago
- Spark runtime on AWS Lambda☆107Updated 7 months ago
- Run dbt serverless in the Cloud (AWS)☆42Updated 5 years ago
- ☆57Updated 2 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 2 years ago
- Docker images that replicate the Amazon SageMaker Notebook instance.☆58Updated 3 years ago
- This repo will teach you how to deploy an ML-powered web app to AWS Fargate from start to finish using Streamlit and AWS CDK☆108Updated 4 years ago
- A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.☆80Updated 11 months ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- Using the Parquet file format with Python☆15Updated last year
- A CLI to manage and monitor permissions in AWS Lake Formation☆26Updated 2 years ago
- Example code for running Spark and Hive jobs on EMR Serverless.☆163Updated 3 months ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆50Updated last year
- ☆34Updated 2 years ago
- Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS☆290Updated 2 weeks ago
- Amazon SageMaker MLOps deployment pipeline for A/B Testing of machine learning models.☆44Updated 3 years ago
- This sample demonstrates how to setup an Amazon SageMaker MLOps end-to-end pipeline for Drift detection☆60Updated last year
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆75Updated 6 years ago
- (project & tutorial) dag pipeline tests + ci/cd setup☆87Updated 4 years ago