yahoo / bandar-log
☆20Updated this week
Related projects: ⓘ
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.☆41Updated last week
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆49Updated 8 months ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated last year
- Dremio Flight connector. Access Dremio using Arrow flight☆40Updated 3 years ago
- Dione - a Spark and HDFS indexing library☆49Updated 6 months ago
- Scala API for Apache Spark SQL high-order functions☆14Updated last year
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated 2 weeks ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 2 years ago
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 4 years ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆29Updated 2 months ago
- This repository contains a recipe for bootstrapping a climate analysis application using Apache Pinot and Superset☆20Updated 4 years ago
- Schema Registry integration for Apache Spark☆39Updated last year
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated last year
- A curated list of Apache Pulsar resources☆13Updated 5 years ago
- Spooker is a dynamic framework for processing high volume data streams via processing pipelines☆29Updated 8 years ago
- ☆26Updated 4 years ago
- Testing Scala code with scalatest☆11Updated last year
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- Repository for advanced unit-testing with embedded kafka services☆25Updated 5 years ago
- The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and eg…☆28Updated last month
- phData Pulse application log aggregation and monitoring☆13Updated 4 years ago
- Amundsen Gremlin☆20Updated 2 years ago
- Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.☆111Updated 4 years ago
- Dynamic Conformance Engine☆30Updated 4 months ago
- Castle is a test harness for Apache Kafka, Trogdor, and related projects.☆0Updated 4 months ago
- Connect DBVisualizer to Hortonwork HiveServer2☆9Updated 9 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆43Updated 5 months ago