yahoo / bandar-log

☆20

Related projects: ⓘ

ExpediaGroup / hiveberg
Demonstration of a Hive Input Format for Iceberg
☆26Updated 3 years ago
rovio / rovio-ingest
An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.
☆41Updated last week
FINRAOS / MegaSparkDiff
A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…
☆49Updated 8 months ago
maropu / spark-data-repair-plugin
Provide functionality to build statistical models to repair dirty tabular data in Spark
☆12Updated last year
dremio-hub / dremio-flight-connector
Dremio Flight connector. Access Dremio using Arrow flight
☆40Updated 3 years ago
paypal / dione
Dione - a Spark and HDFS indexing library
☆49Updated 6 months ago
AbsaOSS / spark-hofs
Scala API for Apache Spark SQL high-order functions
☆14Updated last year
avensolutions / cdc-at-scale-using-spark
Scalable CDC Pattern Implemented using PySpark
☆18Updated 5 years ago
projectnessie / nessie-demos
Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.
☆28Updated 2 weeks ago
ExpediaGroup / shunting-yard
Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.
☆20Updated 2 years ago
B23admin / nifi-stateless-operator
An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes
☆53Updated 4 years ago
AbsaOSS / atum
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆29Updated 2 months ago
kbastani / climate-change-analysis
This repository contains a recipe for bootstrapping a climate analysis application using Apache Pinot and Superset
☆20Updated 4 years ago
hortonworks-spark / spark-schema-registry
Schema Registry integration for Apache Spark
☆39Updated last year
bullet-db / bullet-core
Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…
☆41Updated last year
pascaldevink / awesome-pulsar
A curated list of Apache Pulsar resources
☆13Updated 5 years ago
ottogroup / SPQR
Spooker is a dynamic framework for processing high volume data streams via processing pipelines
☆29Updated 8 years ago
wushujames / kafka-utilities
☆26Updated 4 years ago
MrPowers / scalatest-example
Testing Scala code with scalatest
☆11Updated last year
swoop-inc / spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
☆72Updated 3 years ago
lensesio / kafka-testing
Repository for advanced unit-testing with embedded kafka services
☆25Updated 5 years ago
linkedin / data-integration-library
The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and eg…
☆28Updated last month
phdata / pulse
phData Pulse application log aggregation and monitoring
☆13Updated 4 years ago
amundsen-io / amundsengremlin
Amundsen Gremlin
☆20Updated 2 years ago
indix / schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
☆111Updated 4 years ago
AbsaOSS / enceladus
Dynamic Conformance Engine
☆30Updated 4 months ago
confluentinc / castle
Castle is a test harness for Apache Kafka, Trogdor, and related projects.
☆0Updated 4 months ago
cyanfr / dbvis_to_hortonworks_hiveserver2
Connect DBVisualizer to Hortonwork HiveServer2
☆9Updated 9 years ago
AbsaOSS / hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
☆43Updated 5 months ago