Cloud based Data Platform based on Apache Spark
☆27Feb 17, 2026Updated last month
Alternatives and similar repositories for datapull
Users that are interested in datapull are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 8 months ago
- Spark and Hive docker containers sharing a common MySQL metastore☆26Apr 17, 2020Updated 5 years ago
- Run an open-source data LakeHouse locally using Docker Compose☆12May 31, 2024Updated last year
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29May 15, 2020Updated 5 years ago
- Data Pipeline that utilizes GCP, Python 3.10, Prefect, and more.☆10Jan 23, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- Template for Scala Spark with Unit Test☆13Jul 24, 2023Updated 2 years ago
- Generate mock data based on an Apache Avro schema and specific cardinality settings☆10Apr 16, 2018Updated 7 years ago
- KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.☆15Apr 2, 2026Updated last week
- An Extensible Data Skipping Framework☆48Jul 15, 2025Updated 8 months ago
- ☆11Oct 11, 2022Updated 3 years ago
- ☆33Apr 23, 2019Updated 6 years ago
- Tool for visualizing Apache Oozie pipelines☆12Feb 15, 2016Updated 10 years ago
- Codec for Hadoop adding OpenPGP encryption using Bouncy Castle☆17Aug 18, 2011Updated 14 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆50Feb 11, 2020Updated 6 years ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Mar 2, 2026Updated last month
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Dec 31, 2024Updated last year
- trino monitoring with JMX metrics through Prometheus and Grafana☆17Aug 14, 2024Updated last year
- A Firebase Cloud Function and a Firebase hosted web app to treat weather data collected by Cloud IoT Core☆18Mar 10, 2019Updated 7 years ago
- ☆20Dec 16, 2020Updated 5 years ago
- An end-to-end workflow for processing streaming data on Azure.☆17Sep 20, 2024Updated last year
- ☆11Feb 14, 2020Updated 6 years ago
- A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML☆15Dec 24, 2016Updated 9 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Citadel: Enterprise Search☆15May 2, 2023Updated 2 years ago
- Short Range Ultrasonic Radar - A simple radar using the ultrasonic sensor, this radar works by measuring a range from 3cm to 40 cm as non…☆19Nov 11, 2024Updated last year
- Spark-based pipeline to extract and parse monthly games from the Lichess database.☆21Sep 22, 2025Updated 6 months ago
- Optimizing downstream data processing with Amazon Kinesis Data Firehose and Amazon EMR running Apache Spark☆14Apr 14, 2023Updated 2 years ago
- Code Samples for my Ververica Webinar "99 Ways to Enrich Streaming Data with Apache Flink"☆41Jan 4, 2022Updated 4 years ago
- Basic Spark utilities☆13Feb 20, 2025Updated last year
- ☆16Jan 19, 2022Updated 4 years ago
- Python API for Deequ☆41Nov 10, 2020Updated 5 years ago
- Python wrapper for the Open Brewery DB API☆16Mar 7, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.☆40Aug 31, 2016Updated 9 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 5 years ago
- Automate data collection from Spotify's worldwide ranking in 50+ countries☆24May 3, 2020Updated 5 years ago
- A pyspark lib to validate data quality☆19Nov 11, 2022Updated 3 years ago
- Client libraries of end users of Apache Kyuubi☆11Jan 10, 2023Updated 3 years ago
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16Oct 3, 2025Updated 6 months ago
- API REST boilerplate using Spring Boot and Redis as database☆13Dec 26, 2018Updated 7 years ago