swoop-inc/spark-records

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/swoop-inc/spark-records)

swoop-inc / spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

☆73

Alternatives and similar repositories for spark-records

Users that are interested in spark-records are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

swoop-inc / spark-alchemy
View on GitHub
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
☆191Oct 15, 2025Updated 9 months ago
AbsaOSS / spark-hofs
View on GitHub
Scala API for Apache Spark SQL high-order functions
☆15Aug 4, 2023Updated 2 years ago
hammerlab / spark-tests
View on GitHub
Utilities for writing tests that use Apache Spark.
☆24Dec 29, 2018Updated 7 years ago
AbsaOSS / atum
View on GitHub
A dynamic data completeness and accuracy library at enterprise scale for Apache Spark
☆30May 13, 2026Updated 2 months ago
target / data-validator
View on GitHub
A tool to validate data, built around Apache Spark.
☆102Jun 15, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated 3 weeks ago
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆458Apr 2, 2026Updated 3 months ago
yaooqinn / itachi
View on GitHub
A library that brings useful functions from various modern database management systems to Apache Spark
☆63Sep 4, 2023Updated 2 years ago
jasonsatran / spark-meta
View on GitHub
Spark data profiling utilities
☆23Nov 24, 2018Updated 7 years ago
absognety / atomic-scala
View on GitHub
Atomic Scala Book Solutions - for Beginners and first time Functional Programmers
☆12Mar 10, 2020Updated 6 years ago
univalence / centrifuge
View on GitHub
Data quality tools for Big Data
☆19Oct 10, 2019Updated 6 years ago
wooplevip / sedis
View on GitHub
SQL for Redis
☆11Sep 16, 2022Updated 3 years ago
DataDog / spark-jvm-profiler
View on GitHub
## Auto-archived due to inactivity. ## Simple JVM Profiler Using StatsD and Other Metrics Backends
☆15Oct 3, 2023Updated 2 years ago
apache / incubator-retired-amaterasu
View on GitHub
Apache Amaterasu
☆56Oct 18, 2019Updated 6 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
alexarchambault / ammonite-spark
View on GitHub
Run spark calculations from Ammonite
☆117Jul 2, 2026Updated 2 weeks ago
aravinthsci / Spark_Delta_Lake
View on GitHub
Delta Lake Examples
☆11Apr 24, 2020Updated 6 years ago
fedragon / spark-jobserver-examples
View on GitHub
Experiments with Ooyala's Spark Job Server
☆21Dec 14, 2014Updated 11 years ago
MrPowers / spark-stringmetric
View on GitHub
Spark functions to run popular phonetic and string matching algorithms
☆60Feb 22, 2022Updated 4 years ago
MrPowers / spark-slack
View on GitHub
Speak Slack notifications and process Slack slash commands
☆15Dec 20, 2018Updated 7 years ago
fqaiser94 / mse
View on GitHub
Make Structs Easy (MSE)
☆18Jun 22, 2020Updated 6 years ago
BenFradet / struct-type-encoder
View on GitHub
Deriving Spark DataFrame schemas from case classes
☆44Jun 24, 2024Updated 2 years ago
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
MrPowers / spark-sbt.g8
View on GitHub
A giter8 template for Spark SBT projects
☆72Mar 20, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MrPowers / spark-spec
View on GitHub
Test suite to document the behavior of Spark
☆21Apr 15, 2021Updated 5 years ago
LucaCanali / sparkMeasure
View on GitHub
This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…
☆827May 19, 2026Updated 2 months ago
isarn / isarn-sketches
View on GitHub
Sketching data structures for scala, including t-digest
☆15Sep 7, 2021Updated 4 years ago
chenharryhua / nanjin
View on GitHub
explore kafka, fs2 and pure functional programming in scala
☆34Updated this week
falarica / steerd-presto-operator
View on GitHub
Kubernetes (K8s) Operator for PrestoDB
☆46Sep 29, 2021Updated 4 years ago
andrewleverette / data_wrangling_with_rust
View on GitHub
A series of articles that explore working with data using Datafusion and Apache Arrow.
☆10Mar 17, 2021Updated 5 years ago
Spratiher9 / SparkDataset
View on GitHub
Instant search for and access to many datasets in Pyspark.
☆34Oct 6, 2022Updated 3 years ago
pocha / commu-sqs
View on GitHub
Amazon SQS based (from) server (to) client guaranteed communication model in which server sends GCM to 'wake up' the client in case the c…
☆12Jul 13, 2015Updated 11 years ago
springnz / sparkplug
View on GitHub
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
☆47Aug 1, 2016Updated 9 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mrm1001 / spark_tutorial
View on GitHub
Code for the Spark tutorial at the Pydata conference in London June 2015
☆12Oct 9, 2016Updated 9 years ago
Azure / spark-cdm
View on GitHub
A Spark connector for the Azure Common Data Model
☆15May 31, 2023Updated 3 years ago
praetorian-inc / gcloud-lockdown
View on GitHub
Scripts to demonstrate VPC Service Controls between tenant and shared projects
☆12Jun 11, 2019Updated 7 years ago
holdenk / spark-testing-base
View on GitHub
Base classes to use when writing tests with Spark
☆1,553Apr 20, 2026Updated 3 months ago
bakdata / rebalancing-demo
View on GitHub
Repository that showcases problems with Kafka rebalancing and explains how to fix them. Please visit our blog article to learn what Kafka…
☆12Aug 21, 2020Updated 5 years ago
mslinn / awslib_scala
View on GitHub
An idiomatic Scala wrapper around the AWS Java SDK
☆22Dec 23, 2021Updated 4 years ago
xavierguihot / spark_helper
View on GitHub
A bunch of low-level basic methods for data processing and monitoring with Scala Spark
☆10Jun 29, 2018Updated 8 years ago