crflynn / pbspark
protobuf pyspark conversion
☆23Updated last year
Alternatives and similar repositories for pbspark:
Users that are interested in pbspark are comparing it to the libraries listed below
- Delta reader for the Ray open-source toolkit for building ML applications☆43Updated 11 months ago
- A Python Library to support running data quality rules while the spark job is running⚡☆167Updated last week
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆49Updated last year
- Docker envinroment to stream data from Kafka to Iceberg tables☆24Updated 10 months ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆62Updated 3 months ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Bunch of Airflow Configurations and DAGs for Kubernetes, Spark based data-pipelines. Scale inside Kubernetes using spark kubernetes maste…☆22Updated 2 years ago
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Airflow Providers containing Deferrable Operators & Sensors from Astronomer☆142Updated this week
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆50Updated last week
- Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and…☆28Updated last year
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆93Updated this week
- ☆40Updated last year
- ☆10Updated 2 years ago
- ☆18Updated 2 years ago
- ☆14Updated 11 months ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆192Updated last month
- ☆14Updated 11 months ago
- Enforce Best Practices for all your Airflow DAGs. ⭐☆93Updated this week
- A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.☆184Updated 6 months ago
- ☆49Updated last week
- Weekly Data Engineering Newsletter☆94Updated 6 months ago
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆65Updated 3 years ago
- A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran☆20Updated last week
- Demo DAGs that show how to run dbt Core in Airflow using Cosmos☆52Updated 3 months ago
- ☆62Updated this week
- Magic to help Spark pipelines upgrade☆34Updated 3 months ago
- A library on top of either pex or conda-pack to make your Python code easily available on a cluster☆46Updated last month
- A pyspark lib to validate data quality☆18Updated 2 years ago
- Example of a scalable IoT data processing pipeline setup using Databricks☆31Updated 4 years ago