arthurprevot / yaetos
Write data & AI pipelines in (SQL, Spark, Pandas) and deploy to the cloud, simplified
☆35Updated 3 weeks ago
Alternatives and similar repositories for yaetos:
Users that are interested in yaetos are comparing it to the libraries listed below
- dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.☆57Updated 2 years ago
- DataHub on AWS demonstration resources☆10Updated 2 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆43Updated last year
- Profiles the data, validates the schema and runs data quality checks and produces a report☆20Updated 5 years ago
- event-triggered plugins for airflow☆21Updated 5 years ago
- 📆 Run, schedule, and manage your dbt jobs using Kubernetes.☆24Updated 6 years ago
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....☆74Updated this week
- ☆43Updated 3 weeks ago
- Using the Parquet file format with Python☆15Updated last year
- Big Data Demystified meetup and blog examples☆31Updated 5 months ago
- Docker envinroment to stream data from Kafka to Iceberg tables☆24Updated 11 months ago
- ⚠️ MAINTENANCE-ONLY MODE: Snowplow maintained SQL data models for working with Snowplow web and mobile behavioral data.☆41Updated 3 weeks ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Skeleton project for Apache Airflow training participants to work on.☆16Updated 4 years ago
- ☆10Updated 6 years ago
- [ARCHIVED] The Presto adapter plugin for dbt Core☆33Updated last year
- Visualize dependencies between Airflow DAGs☆49Updated 3 years ago
- Weekly Data Engineering Newsletter☆94Updated 6 months ago
- The dbt adapter for Firebolt☆29Updated 2 weeks ago
- Utility functions for dbt projects running on Spark☆31Updated last week
- 💻 CLI for reporting events to Faros platform☆14Updated 3 months ago
- lakeview is a visibility tool for S3 based data lakes☆30Updated last year
- Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!☆224Updated this week
- Spark app to merge different schemas☆23Updated 4 years ago
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 2 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- ☆47Updated 5 months ago
- The sane way of building a data layer in Airflow☆24Updated 5 years ago