ExpediaGroup / apiary
Apiary provides modules which can be combined to create a federated cloud data lake
☆36Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for apiary
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 8 months ago
- Service for automatically managing and cleaning up unreferenced data☆45Updated this week
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- A library for Spark DataFrame using MinIO Select API☆96Updated 5 years ago
- Performance optimization for Spark running on Kubernetes☆85Updated 4 years ago
- Apache Ranger Plugin for S3☆19Updated last year
- Extensions available for use in Apiary☆10Updated 2 months ago
- ☆13Updated last week
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 7 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- A testing framework for Trino☆26Updated this week
- Graph Analytics with Apache Kafka☆101Updated this week
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆48Updated 10 months ago
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 3 years ago
- Amazon EMR on EKS Custom Image CLI☆25Updated last month
- Extensible streaming ingestion pipeline on top of Apache Spark☆44Updated 8 months ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆62Updated 6 months ago
- Rocksdb state storage implementation for Structured Streaming.☆16Updated 4 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated 2 weeks ago
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆61Updated this week
- Cloud Storage Connector integrates Apache Pulsar with cloud storage.☆28Updated this week
- Amundsen Gremlin☆20Updated 2 years ago
- ☆78Updated last year
- Setup for running Trino with Hive Metastore on Kubernetes☆98Updated 2 years ago
- Deploy Presto on the cloud easily, using Terraform and Packer☆44Updated last year
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30Updated 2 weeks ago
- Paper: A Zero-rename committer for object stores☆20Updated 3 years ago