Cloud based Data Platform based on Apache Spark
☆28May 21, 2026Updated this week
Alternatives and similar repositories for datapull
Users that are interested in datapull are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 10 months ago
- Shunting Yard is a real-time data replication tool that copies data between Hive Metastores.☆20Oct 11, 2021Updated 4 years ago
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 5 years ago
- NetEase Spark Courses☆15Sep 4, 2018Updated 7 years ago
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆28May 15, 2020Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Multi-stage, config driven, SQL based ETL framework using PySpark☆26Sep 16, 2019Updated 6 years ago
- Template for Scala Spark with Unit Test☆13Jul 24, 2023Updated 2 years ago
- ☆16Jun 27, 2020Updated 5 years ago
- Alerting and monitoring tool for Apache Spark☆23May 20, 2022Updated 4 years ago
- Generate mock data based on an Apache Avro schema and specific cardinality settings☆10Apr 16, 2018Updated 8 years ago
- ☆33Apr 23, 2019Updated 7 years ago
- ☆11Oct 11, 2022Updated 3 years ago
- Tool for visualizing Apache Oozie pipelines☆13Feb 15, 2016Updated 10 years ago
- An Extensible Data Skipping Framework☆48Jul 15, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Codec for Hadoop adding OpenPGP encryption using Bouncy Castle☆17Aug 18, 2011Updated 14 years ago
- ☆50Feb 11, 2020Updated 6 years ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Mar 2, 2026Updated 2 months ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Dec 31, 2024Updated last year
- Data quality control tool built on spark and deequ☆25May 9, 2026Updated 2 weeks ago
- A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML☆15Dec 24, 2016Updated 9 years ago
- ☆11Feb 14, 2020Updated 6 years ago
- HDFS based on Java implementation as a remote ObjectStore for DataFusion☆10Feb 13, 2024Updated 2 years ago
- ☆12Oct 16, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Citadel: Enterprise Search☆15May 2, 2023Updated 3 years ago
- Demo code for implementing and showcasing a Fraud Detection Engine with Apache Flink.☆33Oct 20, 2022Updated 3 years ago
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- Code Samples for my Ververica Webinar "99 Ways to Enrich Streaming Data with Apache Flink"☆41Jan 4, 2022Updated 4 years ago
- Python API for Deequ☆41Nov 10, 2020Updated 5 years ago
- Python wrapper for the Open Brewery DB API☆16Mar 7, 2024Updated 2 years ago
- This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.☆40Aug 31, 2016Updated 9 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆74Mar 14, 2021Updated 5 years ago
- A pyspark lib to validate data quality☆19Nov 11, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark w…☆16Oct 3, 2025Updated 7 months ago
- API REST boilerplate using Spring Boot and Redis as database☆13Dec 26, 2018Updated 7 years ago
- Due to lack of resources on how to deploy kafka with simple SASL authentication (just username and password) and how to write producer an…☆12Dec 29, 2021Updated 4 years ago
- A tool to validate data, built around Apache Spark.☆102May 13, 2026Updated last week
- End-to-End deployment of E-commerce customers segmentation using Clustering Machine learning algorithms in Google Cloud Platform and MLOp…☆19Jun 5, 2024Updated last year
- Example to create lineage in Atlas with sqoop and spark☆14Apr 5, 2017Updated 9 years ago
- Service for automatically managing and cleaning up unreferenced data☆50Apr 24, 2026Updated last month