Cloud based Data Platform based on Apache Spark
☆27Feb 17, 2026Updated 2 weeks ago
Alternatives and similar repositories for datapull
Users that are interested in datapull are comparing it to the libraries listed below
Sorting:
- NetEase Spark Courses☆15Sep 4, 2018Updated 7 years ago
- Spark Structured Streaming JDBC Sink☆16Apr 26, 2021Updated 4 years ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆46Jul 17, 2025Updated 7 months ago
- Data quality control tool built on spark and deequ☆25Jan 22, 2026Updated last month
- Django with Data Science [Video], published by Packt☆12Dec 15, 2025Updated 2 months ago
- Spark and Hive docker containers sharing a common MySQL metastore☆26Apr 17, 2020Updated 5 years ago
- Demo code for implementing and showcasing a Fraud Detection Engine with Apache Flink.☆33Oct 20, 2022Updated 3 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Dec 31, 2024Updated last year
- Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌☆29May 15, 2020Updated 5 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 4 years ago
- ☆33Apr 23, 2019Updated 6 years ago
- Hackerank Programming Challenges☆10May 8, 2021Updated 4 years ago
- A clean online résumé (CV)☆13Jun 6, 2024Updated last year
- seckill秒杀项目【PRC】☆10Apr 13, 2019Updated 6 years ago
- Second generation of the ICGC DCC release ETL built on Spark☆10Apr 8, 2019Updated 6 years ago
- A timer module for Redis☆11Oct 16, 2019Updated 6 years ago
- Integration of Iceberg table management into Spark SQL☆11Jan 21, 2020Updated 6 years ago
- An exploration of Flink and change-data-capture via flink-cdc-connectors☆11Jul 7, 2021Updated 4 years ago
- This is a list of YAML file examples for Docker, Kubernetes, Ansible. Also includes a Python script.☆10Jan 12, 2021Updated 5 years ago
- Spark implementation of Slowly Changing Dimension type 2☆11Jan 8, 2019Updated 7 years ago
- ☆13Dec 5, 2022Updated 3 years ago
- A tool to validate data, built around Apache Spark.☆101Feb 19, 2026Updated last week
- On-demand port forwarding to k8s.☆23Feb 7, 2026Updated 3 weeks ago
- POC for all the stack of big data (kafka, spark, cassandra, hdfs, docker, springboot)☆12Dec 16, 2022Updated 3 years ago
- Java Alerting Framework for ElasticSearch☆12May 20, 2016Updated 9 years ago
- Client libraries of end users of Apache Kyuubi☆11Jan 10, 2023Updated 3 years ago
- Atomic Scala Book Solutions - for Beginners and first time Functional Programmers☆12Mar 10, 2020Updated 5 years ago
- Files for the Docker and Kubernetes on Google Cloud Hands-On labs☆11Mar 14, 2023Updated 2 years ago
- A Fully HiveServer2-like Multi-tenancy Spark Thrift Server Supporting Impersonation and Multi-SparkContext with Ranger Authorization (GO …☆10Jul 7, 2022Updated 3 years ago
- Run an open-source data LakeHouse locally using Docker Compose☆12May 31, 2024Updated last year
- Exposes Redis stream through the command line☆12Jun 28, 2022Updated 3 years ago
- Java OutOfMemory Example☆11Jun 19, 2021Updated 4 years ago
- A simple golang job queue☆13Jan 19, 2023Updated 3 years ago
- ansible with kubernetes☆10Feb 14, 2023Updated 3 years ago
- All my leet code solutions in Java☆11Aug 9, 2021Updated 4 years ago
- flink connector for redis☆10Apr 22, 2023Updated 2 years ago
- Sample demo to deploy an Apache Kafka cluster and monitor it using Strimzi, Grafana and Prometheus operators.☆10May 18, 2021Updated 4 years ago
- A boilerplate project for Azure Big Data PaaS services☆14Dec 7, 2022Updated 3 years ago
- 基于netty实现代理服务器☆11Nov 17, 2019Updated 6 years ago