codingtony / docker-impala
Docker image that runs an Hadoop cluster in single node mode, running Impala server version 2.0.1. Based on CDH5.
☆40Updated 9 years ago
Alternatives and similar repositories for docker-impala:
Users that are interested in docker-impala are comparing it to the libraries listed below
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Updated 2 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 5 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Spark☆41Updated 7 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 5 years ago
- Support Highcharts in Apache Zeppelin☆81Updated 7 years ago
- Reads a HBase table and writes the out as Text, Seq, Avro, or Parquet☆28Updated 10 years ago
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Updated 7 years ago
- Schema Registry integration for Apache Spark☆40Updated 2 years ago
- HADOOP-CLI is an interactive command line shell that makes interacting with the Hadoop Distribted Filesystem (HDFS) simpler and more intu…☆36Updated 8 months ago
- High performance HBase / Spark SQL engine☆28Updated 2 years ago
- Starter project for building MemSQL Streamliner Pipelines☆32Updated 7 years ago
- A sink to save Spark Structured Streaming DataFrame into Hive table☆23Updated 6 years ago
- Example project to show how to use Spark to read and write Avro/Parquet files☆50Updated 11 years ago
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.☆28Updated 7 years ago
- Docker image for Apache Spark☆76Updated 5 years ago
- A Spark metrics sink that pushes to InfluxDb☆51Updated 4 years ago
- Interactive Audience Analytics with Spark and HyperLogLog☆55Updated 9 years ago
- A Spark datasource for the HadoopOffice library☆38Updated 2 years ago
- Cascading on Apache Flink®☆54Updated last year
- ☆70Updated 2 years ago
- Docker image for Apache Hive running on Tez☆25Updated 9 years ago
- Demonstrates how to submit a job to Spark on HDP directly via YARN's REST API from any workstation☆24Updated 8 years ago
- A connector for SingleStore and Spark☆160Updated 2 months ago
- Spark SQL index for Parquet tables☆134Updated 3 years ago
- functionstest☆33Updated 8 years ago
- A Real-Time Analytical Processing (RTAP) example using Spark/Shark☆51Updated 10 years ago
- Demo quering counts of a event stream with Apache Flink☆23Updated 6 years ago
- Docker image for apache zeppelin☆38Updated 7 years ago
- Framework for running macro benchmarks in a clustered environment☆24Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆88Updated 11 months ago