ggear / cloudera-frameworkLinks
☆11Updated 5 years ago
Alternatives and similar repositories for cloudera-framework
Users that are interested in cloudera-framework are comparing it to the libraries listed below
Sorting:
- Build configuration-driven ETL pipelines on Apache Spark☆162Updated 3 years ago
- Spark connector for SFTP☆98Updated 2 years ago
- File compaction tool that runs on top of the Spark framework.☆59Updated 6 years ago
- Mirror of Apache Bahir☆335Updated 2 years ago
- ☆103Updated 5 years ago
- Example project showing how to use Hive UDFs in Apache Spark☆55Updated 6 years ago
- Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.☆128Updated 7 years ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 4 years ago
- Presto connector for Apache Kudu☆48Updated 6 years ago
- hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE☆296Updated 2 years ago
- Spark package for checking data quality☆222Updated 5 years ago
- Custom state store providers for Apache Spark☆92Updated 10 months ago
- Kerberos and Hadoop: The Madness beyond the Gate☆280Updated 2 years ago
- Spark Structured Streaming / Kafka / Cassandra / Elastic☆183Updated 2 years ago
- An Open Source unit test framework for Hive queries based on JUnit 4 and 5☆261Updated 11 months ago
- Schema Registry☆17Updated last year
- Test your Hive scripts inside your favorite IDE with HiveQLUnit! Increase your developers productivity by testing on all operating system…☆40Updated 5 years ago
- Mirror of Apache Slider☆77Updated 7 years ago
- Spark, Spark Streaming and Spark SQL unit testing strategies☆216Updated 9 years ago
- Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.☆446Updated 3 months ago
- Examples of Spark 2.0☆212Updated 4 years ago
- Write your Spark data to Kafka seamlessly☆174Updated last year
- Enabling Spark Optimization through Cross-stack Monitoring and Visualization☆47Updated 8 years ago
- Remedy small files by combining them into larger ones.☆194Updated 3 years ago
- Mirror of Apache Atlas (Incubating)☆95Updated 2 years ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆284Updated last month
- Structured Streaming Machine Learning example with Spark 2.0☆94Updated 8 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆91Updated last year
- Mirror of Apache Hivemall (incubating)☆314Updated 3 years ago
- DataQuality for BigData☆145Updated 2 years ago