Helpers & syntactic sugar for PySpark.
☆62Dec 4, 2025Updated 5 months ago
Alternatives and similar repositories for sparkly
Users that are interested in sparkly are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.☆20Jan 11, 2018Updated 8 years ago
- Collect and aggregate on spark events for profitz☆10Apr 22, 2022Updated 4 years ago
- Scala API for Apache Spark SQL high-order functions☆14Aug 4, 2023Updated 2 years ago
- Resilient data pipeline framework running on Apache Spark☆28Updated this week
- Kubernetes Course for SQL Server Big Data Clusters☆12Jun 27, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Apache (Py)Spark type annotations (stub files).☆118Aug 17, 2022Updated 3 years ago
- sparkql: Apache Spark SQL DataFrame schema management for sensible humans☆12Sep 18, 2023Updated 2 years ago
- A boilerplate for writing PySpark Jobs☆394Jan 21, 2024Updated 2 years ago
- Asynchronous message queue consumer and scheduler☆60Dec 15, 2017Updated 8 years ago
- Demonstrates how to submit a job to Spark on HDP directly via YARN's REST API from any workstation☆23Apr 18, 2016Updated 10 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆17Jan 12, 2017Updated 9 years ago
- Dynamic Conformance Engine☆33Mar 26, 2026Updated 2 months ago
- Asynchronous actions for PySpark☆47Dec 2, 2021Updated 4 years ago
- HADOOP-CLI is an interactive command line shell that makes interacting with the Hadoop Distribted Filesystem (HDFS) simpler and more intu…☆37May 7, 2026Updated 3 weeks ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A library that provides useful extensions to Apache Spark and PySpark.☆238Mar 18, 2026Updated 2 months ago
- Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks☆360Jun 6, 2017Updated 8 years ago
- Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines☆17Jan 21, 2020Updated 6 years ago
- Spark app to merge different schemas☆23Dec 21, 2020Updated 5 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆687Mar 6, 2025Updated last year
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- Write Web API clients using annotations in python☆16May 1, 2026Updated 3 weeks ago
- Real-time query spark and visualise it as graph.☆24Oct 4, 2017Updated 8 years ago
- Universal fuzzy selector for macOs comparable with dmenu☆12Apr 22, 2019Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A simple elasticsearch frontend for serving astrophysical simulation catalog data☆10Mar 14, 2026Updated 2 months ago
- A Scalable Data Cleaning Library for PySpark.☆29Apr 4, 2019Updated 7 years ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,536Dec 2, 2024Updated last year
- A CLI to manage and monitor permissions in AWS Lake Formation☆25Feb 8, 2023Updated 3 years ago
- ☆14Dec 27, 2016Updated 9 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆74Mar 14, 2021Updated 5 years ago
- ACID and BASE transactions explained☆15May 18, 2025Updated last year
- A curated list of awesome Machine Learning frameworks, libraries and software.☆19Jul 16, 2014Updated 11 years ago
- ☆19Oct 29, 2014Updated 11 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- something to help you spark☆65Oct 23, 2018Updated 7 years ago
- An OpenCalais API Interface for Python.☆21Mar 13, 2012Updated 14 years ago
- load word embeddings to Torch.Tensor☆14May 12, 2016Updated 10 years ago
- Bluetooth Bluez binding in Crystal☆11Oct 11, 2018Updated 7 years ago
- Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with S…☆460May 19, 2026Updated last week
- Coding exercises for Apache Spark☆104Jun 4, 2015Updated 10 years ago
- asyncio compatible driver for elasticsearch☆99Apr 29, 2019Updated 7 years ago