ExpediaGroup / apiaryLinks
Apiary provides modules which can be combined to create a federated cloud data lake
☆36Updated last year
Alternatives and similar repositories for apiary
Users that are interested in apiary are comparing it to the libraries listed below
Sorting:
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated last year
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 5 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 4 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆51Updated 3 weeks ago
- Service for automatically managing and cleaning up unreferenced data☆46Updated last week
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- Deploy Presto on the cloud easily, using Terraform and Packer☆45Updated 2 years ago
- Dione - a Spark and HDFS indexing library☆52Updated last year
- Graph Analytics with Apache Kafka☆104Updated last week
- Amundsen Gremlin☆21Updated 2 years ago
- A library for Spark DataFrame using MinIO Select API☆98Updated 5 years ago
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated 2 years ago
- Stream Discovery and Stream Orchestration☆122Updated 5 months ago
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆69Updated 4 months ago
- Cloud Storage Connector integrates Apache Pulsar with cloud storage.☆28Updated this week
- An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.☆43Updated 2 weeks ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 6 years ago
- An application that records stats about consumer group offset commits and reports them as prometheus metrics☆14Updated 6 years ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆61Updated 7 months ago
- Explore Apache Kafka data pipelines in Kubernetes.☆46Updated last week
- A testing framework for Trino☆26Updated 3 months ago
- A library for strong, schema based conversion between 'natural' JSON documents and Avro☆18Updated last year
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 4 years ago
- Docker Image and Kubernetes Configurations for Spark 2.x☆41Updated 5 years ago
- Kubernetes Operator for the Ververica Platform☆35Updated 2 years ago
- ☆14Updated last month
- Quark is a data virtualization engine over analytic databases.☆98Updated 8 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 4 years ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆75Updated 2 years ago