Gelerion / spark-sketches
Integrating probabilistic algorithms into Spark using DataSketches
☆8Updated last year
Related projects ⓘ
Alternatives and complementary repositories for spark-sketches
- Scala API for Apache Spark SQL high-order functions☆14Updated last year
- Apache Spark ETL Utilities☆40Updated 3 weeks ago
- A dynamic data completeness and accuracy library at enterprise scale for Apache Spark☆30Updated last week
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆72Updated 3 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- Schema Registry integration for Apache Spark☆39Updated last year
- A library that brings useful functions from various modern database management systems to Apache Spark☆56Updated last year
- type-class based data cleansing library for Apache Spark SQL☆79Updated 5 years ago
- Avro Schema Shredder is a REST API that enables storage of Avro Schemas in Apache Atlas. This API enables an organization to use Apache A…☆13Updated 7 years ago
- Utilities for writing tests that use Apache Spark.☆24Updated 5 years ago
- A Spark datasource for the HadoopOffice library☆39Updated 2 years ago
- low-level helpers for Apache Spark libraries and tests☆16Updated 5 years ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆14Updated 8 months ago
- Scala + Druid: Scruid. A library that allows you to compose queries in Scala, and parse the result back into typesafe classes.☆115Updated 3 years ago
- functionstest☆33Updated 8 years ago
- Google Spreadsheets datasource for SparkSQL and DataFrames☆57Updated last year
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆28Updated 4 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Updated 2 months ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆44Updated 7 months ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆48Updated 10 months ago
- Code snippets used in demos recorded for the blog.☆29Updated 3 weeks ago
- Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.☆70Updated last year
- A tool to validate data, built around Apache Spark.☆101Updated this week
- Deriving Spark DataFrame schemas from case classes☆44Updated 4 months ago
- Provide functionality to build statistical models to repair dirty tabular data in Spark☆12Updated last year
- Common components used across the datamountaineer kafka connect connectors☆21Updated 3 years ago
- Data quality tools for Big Data☆19Updated 5 years ago
- Observability Python library - Powered by Kensu☆22Updated 3 weeks ago