IntegriChain1 / s3parq
Parquet file management in S3 for Athena / Spectrum / Presto partitioning
☆22Updated last week
Related projects ⓘ
Alternatives and complementary repositories for s3parq
- Data Catalog for Databases and Data Warehouses☆31Updated 9 months ago
- A tool to learn JSON schema from collection of documents and generate Create table statement for Redshift☆19Updated 3 weeks ago
- A template for an AWS Lambda function that triggers Prefect Flow Runs☆20Updated 3 years ago
- Amundsen Gremlin☆20Updated 2 years ago
- Examples of various flow deployments for Prefect 1.0 (storage and run configurations)☆35Updated 2 years ago
- A collection of python utility functions☆12Updated 4 months ago
- ☆53Updated last year
- A CLI to manage and monitor permissions in AWS Lake Formation☆25Updated last year
- CLI for data platform☆19Updated 11 months ago
- a pytest plugin for dbt adapter test suites☆19Updated last year
- Dask on ECS Fargate☆14Updated 5 years ago
- Dask integration for Snowflake☆30Updated 4 months ago
- [ARCHIVED] The Presto adapter plugin for dbt Core☆33Updated 10 months ago
- A proof-of-concept repo that attempts to use Apache Superset with a custom ADBC to Arrow Flight SQL SQLAlchemy driver.☆22Updated last year
- Generate Hive CREATE TABLE statements from json data☆10Updated 7 years ago
- Spawns JupyterHub single user servers in Docker containers running in AWS Fargate☆47Updated last month
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆26Updated 8 months ago
- Fully unit tested utility functions for data engineering. Python 3 only.☆14Updated 2 months ago
- Deploy Presto on the cloud easily, using Terraform and Packer☆44Updated last year
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated last year
- run SQL queries on AWS Athena from jupyter notebooks☆19Updated 5 years ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆25Updated this week
- Activity Schema dbt package☆14Updated last year
- Utility functions for dbt projects running on Spark☆31Updated last year
- A python package to create a database on the platform using our moj data warehousing framework☆21Updated 2 months ago
- 📆 Run, schedule, and manage your dbt jobs using Kubernetes.☆24Updated 6 years ago
- lakeview is a visibility tool for S3 based data lakes☆30Updated last year
- Data pipelines from re-usable components☆106Updated last year
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆19Updated 4 years ago
- The sane way of building a data layer in Airflow☆24Updated 4 years ago