MrPowers/bebe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MrPowers/bebe)

MrPowers / bebe

Filling in the Spark function gaps across APIs

☆50

Alternatives and similar repositories for bebe

Users that are interested in bebe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yaooqinn / itachi
View on GitHub
A library that brings useful functions from various modern database management systems to Apache Spark
☆63Sep 4, 2023Updated 2 years ago
mrpowers-io / spark-daria
View on GitHub
Essential Spark extensions and helper methods ✨😲
☆767Jun 22, 2026Updated last month
zeotap / spark-property-tests
View on GitHub
Write property based tests easily on spark dataframes
☆21Jan 19, 2024Updated 2 years ago
Spratiher9 / JumpSpark
View on GitHub
JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.
☆10May 12, 2023Updated 3 years ago
SaurabhChawla100 / spark-radiant
View on GitHub
Spark-Radiant is Apache Spark Performance and Cost Optimizer
☆25Dec 31, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
steveloughran / zero-rename-committer
View on GitHub
Paper: A Zero-rename committer for object stores
☆20Nov 7, 2025Updated 8 months ago
MrPowers / spark-stringmetric
View on GitHub
Spark functions to run popular phonetic and string matching algorithms
☆60Feb 22, 2022Updated 4 years ago
swoop-inc / spark-alchemy
View on GitHub
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
☆191Oct 15, 2025Updated 9 months ago
cerndb / sparkMeasure
View on GitHub
This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…
☆16May 21, 2026Updated 2 months ago
mrpowers-io / spark-fast-tests
View on GitHub
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
☆457Apr 2, 2026Updated 3 months ago
mrpowers-io / spark-style-guide
View on GitHub
Spark style guide
☆271Sep 30, 2024Updated last year
minio / spark-select
View on GitHub
A library for Spark DataFrame using MinIO Select API
☆102Sep 27, 2019Updated 6 years ago
zheyuan28 / SparkTaskMetrics
View on GitHub
Task Metrics Explorer
☆14Apr 2, 2019Updated 7 years ago
godatadriven / ducklake-blog-1
View on GitHub
Example files used in the DuckDB - Unity Catalog blog
☆10Dec 6, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mrpowers-io / jodie
View on GitHub
Delta lake and filesystem helper methods
☆51Feb 29, 2024Updated 2 years ago
hablapps / sparkOptics
View on GitHub
Optics for Spark DataFrames
☆48Mar 5, 2021Updated 5 years ago
databrickslabs / tika-ocr
View on GitHub
☆22Oct 21, 2024Updated last year
rchillyard / Number
View on GitHub
This project is about numbers: exact (1, e, π, 𝛙, √2, etc.), fuzzy e.g., 1836.152673426(32), or lazy e.g., cos(2π), as quantities (with …
☆16Jun 30, 2026Updated 3 weeks ago
holdenk / spark-upgrade
View on GitHub
Magic to help Spark pipelines upgrade
☆34Jul 11, 2026Updated last week
Databeans / lighthouse
View on GitHub
Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…
☆10Jul 31, 2023Updated 2 years ago
target / data-validator
View on GitHub
A tool to validate data, built around Apache Spark.
☆102Jun 15, 2026Updated last month
pfcoperez / thebutlerdidit
View on GitHub
A thread dump analyzer tool running on your browser or in your JVM that generates DOT documents out of `jstack` outputs.
☆10Jul 10, 2024Updated 2 years ago
Sanoma-CDA / maxmind-geoip2-scala
View on GitHub
Simple Scala wrapper for MaxMind GeoIP2 webservice client and database reader http://maxmind.github.io/GeoIP2-java/
☆33May 19, 2021Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
galliaproject / gallia-core
View on GitHub
A schema-aware Scala library for data transformation
☆88Feb 23, 2024Updated 2 years ago
izeigerman / scalanum
View on GitHub
☆12Jul 28, 2017Updated 8 years ago
richardanaya / spark_delta_lake
View on GitHub
☆16Jun 27, 2020Updated 6 years ago
mikulskibartosz / check-engine
View on GitHub
Data validation library for PySpark 3.0.0
☆33Nov 11, 2022Updated 3 years ago
amesar / spark-python-scala-udf
View on GitHub
Demonstrates calling a Scala UDF from Python using spark-submit with an EGG and JAR
☆23Mar 3, 2020Updated 6 years ago
izeigerman / mindshard
View on GitHub
A Rust-based HTTP proxy that captures and indexes HTTP traffic using vector embeddings.
☆15Jan 3, 2026Updated 6 months ago
yadavan88 / coursier-cheatsheets
View on GitHub
Useful coursier commands
☆27Apr 25, 2025Updated last year
shazam / scala-datapipeline-dsl
View on GitHub
Domain-specific language to help build and maintain AWS Data Pipelines
☆26Aug 22, 2018Updated 7 years ago
databrickslabs / arcuate
View on GitHub
Delta Sharing + MLflow for ML model & experiment exchange (arcuate delta - a fan shaped river delta)
☆22Jan 29, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jparkie / Spark2Cassandra
View on GitHub
Spark Library for Bulk Loading into Cassandra
☆12Apr 18, 2018Updated 8 years ago
hablapps / doric
View on GitHub
Type safety for spark columns
☆79Oct 27, 2025Updated 8 months ago
jasonsatran / spark-meta
View on GitHub
Spark data profiling utilities
☆23Nov 24, 2018Updated 7 years ago
awslabs / amazon-s3-tagging-spark-util
View on GitHub
☆12Oct 16, 2023Updated 2 years ago
jjtolton / libapl-clj
View on GitHub
GNU APL native interop for Clojure
☆29Mar 18, 2022Updated 4 years ago
aws-samples / dbtgluenyctaxidemo
View on GitHub
☆11Oct 11, 2022Updated 3 years ago
newfront / spark-intro-to-ml
View on GitHub
A Gentle introduction to Machine Learning with Apache Spark
☆11Mar 2, 2026Updated 4 months ago