davlum / localemrLinks
Local AWS EMR - A local service that imitates AWS EMR
☆27Updated 2 years ago
Alternatives and similar repositories for localemr
Users that are interested in localemr are comparing it to the libraries listed below
Sorting:
- Delta Lake helper methods. No Spark dependency.☆23Updated 10 months ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆64Updated 3 years ago
- Pylint plugin for static code analysis on Airflow code☆95Updated 4 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆96Updated this week
- Fast iterative local development and testing of Apache Airflow workflows☆202Updated 2 months ago
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆191Updated this week
- A Python Library to support running data quality rules while the spark job is running⚡☆189Updated this week
- Visualize dependencies between Airflow DAGs☆49Updated 4 years ago
- Making DAG construction easier☆267Updated this week
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆218Updated last month
- Schema modelling framework for decentralised domain-driven ownership of data.☆254Updated last year
- pytest plugin to run the tests with support of pyspark☆86Updated 2 months ago
- Airflow Providers containing Deferrable Operators & Sensors from Astronomer☆149Updated last week
- Great Expectations Airflow operator☆168Updated last week
- Adapter for dbt that executes dbt pipelines on Apache Flink☆95Updated last year
- The shared semantic layer definitions that dbt-core and MetricFlow use.☆80Updated 2 weeks ago
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows☆43Updated last month
- dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks☆440Updated last week
- A library that provides useful extensions to Apache Spark and PySpark.☆228Updated last week
- A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational e…☆107Updated last month
- Drop-in replacement for Apache Spark UI☆277Updated this week
- Enforce Best Practices for all your Airflow DAGs. ⭐☆104Updated this week
- Library to convert DBT manifest metadata to Airflow tasks☆48Updated last year
- ✨ A Pydantic to PySpark schema library☆98Updated this week
- A VS Code Extension to make it easier to manage and develop Spark jobs on EMR☆38Updated 5 months ago
- Pipeline definitions for managing data flows to power analytics at MIT Open Learning☆43Updated this week
- ☆80Updated 3 months ago
- ☆78Updated 5 months ago
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.☆266Updated 3 months ago
- PySpark schema generator☆43Updated 2 years ago