waltherg / distributable_docker_sql_on_hadoop
Toy Hadoop cluster combining various SQL-on-Hadoop variants
☆12Updated 7 years ago
Alternatives and similar repositories for distributable_docker_sql_on_hadoop:
Users that are interested in distributable_docker_sql_on_hadoop are comparing it to the libraries listed below
- This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.☆41Updated 8 years ago
- Ambari stack service for installing and managing Apache Airflow on HDP cluster☆59Updated 6 years ago
- A library for querying Druid data sources with Apache Spark☆23Updated 4 years ago
- ☆24Updated 4 years ago
- Spark structured streaming with Kafka data source and writing to Cassandra☆63Updated 5 years ago
- Apache Spark ETL Utilities☆40Updated 2 months ago
- A proof of concept using Divolte, Kafka, Druid and Superset☆61Updated 4 years ago
- Real-time anomaly detection using Kafka, KSQL User Defined Function and a pre-trained model☆30Updated last year
- Collection of examples integrating NiFi with stream process frameworks.☆56Updated 8 years ago
- Spark Clickhouse Connector☆72Updated 4 years ago
- Spark Connector to read and write with Pulsar☆113Updated 2 months ago
- This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server☆49Updated last year
- ☆38Updated 6 years ago
- Sample processing code using Spark 2.1+ and Scala☆51Updated 4 years ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Updated 3 years ago
- A bridge to Apache Atlas for provenance metadata created in course of using Apache NiFi☆15Updated 2 years ago
- Sample Spark Streaming application for secure consumption from Kafka☆33Updated 7 years ago
- A Spark datasource for the HadoopOffice library☆39Updated 2 years ago
- A modern real-time streaming application serving as a reference framework for developing a big data pipeline, complete with a broad range…☆41Updated 4 years ago
- ☆8Updated 8 years ago
- Running Presto on k8s☆38Updated 5 years ago
- Hadoop, Hive, Parquet and Hue in docker-compose v3☆42Updated 4 years ago
- DataQuality for BigData☆143Updated last year
- Flink Examples☆39Updated 8 years ago
- Ansible roles to install an Spark Standalone cluster (HDFS/Spark/Jupyter Notebook) or Ambari based Spark cluster☆61Updated 11 months ago
- A Spark metrics sink that pushes to InfluxDb☆51Updated 4 years ago
- Deploy your Spark Production Cluster on Kubernetes☆47Updated 4 years ago
- phData Pulse application log aggregation and monitoring☆13Updated 4 years ago
- Base Docker image with just essentials: Hadoop, Hive and Spark.☆68Updated 3 years ago
- Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, …☆34Updated last month