IntegriChain1 / s3parqLinks
Parquet file management in S3 for Athena / Spectrum / Presto partitioning
☆22Updated 5 months ago
Alternatives and similar repositories for s3parq
Users that are interested in s3parq are comparing it to the libraries listed below
Sorting:
- A template for an AWS Lambda function that triggers Prefect Flow Runs☆20Updated 3 years ago
- Data Catalog for Databases and Data Warehouses☆35Updated last year
- A collection of python utility functions☆11Updated last year
- IceRunner is an Apache Arrow Flight Server Implementation for Apache Iceberg Tables☆9Updated 3 months ago
- An experimental Athena extension for DuckDB 🐤☆54Updated 6 months ago
- Jupyter Notebook Remote Scheduler for Argo on Kubernetes☆11Updated 7 months ago
- Dask integration for Snowflake☆30Updated 7 months ago
- Amundsen Gremlin☆21Updated 2 years ago
- A conda-smithy repository for python-duckdb.☆13Updated 2 weeks ago
- The open source version of the Amazon Redshift Cluster Management Guide.☆48Updated 2 years ago
- pysh-db - The Data Science Toolkit (DSK)☆13Updated 6 years ago
- DataHub on AWS demonstration resources☆10Updated 2 years ago
- Dask on ECS Fargate☆14Updated 5 years ago
- The sane way of building a data layer in Airflow☆24Updated 5 years ago
- A tool to learn JSON schema from collection of documents and generate Create table statement for Redshift☆21Updated 8 months ago
- ☆17Updated 2 months ago
- A CLI to manage and monitor permissions in AWS Lake Formation☆26Updated 2 years ago
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆20Updated 5 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- Utilities for creating ETL pipelines with mara☆36Updated 3 years ago
- ☆11Updated 7 months ago
- ☆52Updated this week
- Python stream processing for analytics☆40Updated this week
- ☆15Updated 4 years ago
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆26Updated last year
- Hackney Data Platform Infrastructure and Code☆17Updated this week
- A serverless duckDB deployment at GCP☆39Updated 2 years ago
- A python package to create a database on the platform using our moj data warehousing framework☆22Updated 3 weeks ago
- An infrastructure as code approach to deploying Snowflake using Terraform☆25Updated 2 years ago
- A pyspark lib to validate data quality☆18Updated 2 years ago