Simple Spark example of generating table stats for use of data quality checks
☆28Apr 28, 2017Updated 8 years ago
Alternatives and similar repositories for Spark.TableStatsExample
Users that are interested in Spark.TableStatsExample are comparing it to the libraries listed below
Sorting:
- A framework for systematically quality controlling big data.☆40Mar 13, 2023Updated 3 years ago
- Examples for Apache Oozie book☆18May 30, 2016Updated 9 years ago
- Workshop for Hadoop Operations Best Practices☆10Feb 24, 2015Updated 11 years ago
- Track app memory usage.☆11Jan 13, 2015Updated 11 years ago
- The code for the in memory data pipeline that was presented at Berlin Buzzwords 2015.☆10Jun 1, 2015Updated 10 years ago
- Scala, DSL, Rules based reactive workflows and Microservices☆14Oct 20, 2025Updated 5 months ago
- Learning Apache Kylin for beginner☆30Jun 7, 2018Updated 7 years ago
- An columnar serializer☆15Feb 26, 2016Updated 10 years ago
- All My Pytorch projects reside here☆33Dec 10, 2017Updated 8 years ago
- Python wrapper for the hadoop WebHDFS Rest API☆32Apr 11, 2015Updated 10 years ago
- O'Reilly Course, In-Memory Computing Essentials☆10Oct 16, 2020Updated 5 years ago
- Scala, Akka, and MongoDB web exercise for the Chicago-Area Scala Enthusiasts Meeting.☆46Oct 17, 2019Updated 6 years ago
- spark backend for dplyr☆48Dec 30, 2015Updated 10 years ago
- ☆76May 19, 2015Updated 10 years ago
- Source code for 'Big Data SMACK' by Raul Estrada and Isaac Ruiz☆15Mar 28, 2017Updated 8 years ago
- Custom Service for deploying Apache Alluxio on a running HDP 2.3 / IOP 4.1 Ambari Managed Cluster☆13Jan 13, 2017Updated 9 years ago
- Mastering Apache Camel by Packt Publishing☆13Apr 14, 2023Updated 2 years ago
- ☆11Sep 23, 2015Updated 10 years ago
- Ansible scripts for deploying Kafka on EC2☆10Oct 7, 2016Updated 9 years ago
- Integrate Grafana with Ambari Metrics System☆27Jun 13, 2025Updated 9 months ago
- Based off the design of SparkOnHBase. This Repo will support Spark, Spark Streaming, and Spark SQL integration with Kudu.☆50May 19, 2016Updated 9 years ago
- Using Fastai library to classify Twitter jokes in Spanish☆12Jul 4, 2019Updated 6 years ago
- Coding interview questions with solutions and tests (Scala)☆26Sep 23, 2025Updated 5 months ago
- Python Client for WebHDFS REST API☆43May 8, 2015Updated 10 years ago
- Let's run Ambari using docker compose. (feat. FreeIPA)☆10Nov 24, 2024Updated last year
- Spark package for checking data quality☆223Feb 28, 2020Updated 6 years ago
- ☆13Jan 11, 2023Updated 3 years ago
- The Databend plugin for dbt (data build tool)☆12Mar 17, 2023Updated 3 years ago
- Segment's bundled integration for Firebase on iOS☆13Mar 26, 2024Updated last year
- Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...☆19Dec 7, 2017Updated 8 years ago
- My HackerRank Solutions : https://www.hackerrank.com/RohanKhude☆12Jul 13, 2016Updated 9 years ago
- Core & Community developed monitoring integrations for Sematext monitoring agent☆13May 30, 2024Updated last year
- This is OpenMLDB's Spark Distribution, which is particularly optimized for feature extraction. It includes a few novel techniques, such a…☆12Jul 30, 2024Updated last year
- Training materials and accompanying documentation for "Mastering Transformers: From Building Blocks to Real World Applications" training.☆13Sep 13, 2023Updated 2 years ago
- Configures and builds a database for engagement events generated by Amazon Simple Email Service (SES) and Amazon Pinpoint engagements usi…☆13Jan 16, 2025Updated last year
- Simplified custom plugins for Trino☆16Jul 29, 2024Updated last year
- ☆10Aug 30, 2019Updated 6 years ago
- Presentation that gives a high level overview of Go and some basic language use case highlights.☆15Jan 29, 2017Updated 9 years ago
- ☆18Sep 7, 2014Updated 11 years ago