subhamkharwal/ease-with-apache-spark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/subhamkharwal/ease-with-apache-spark)

subhamkharwal / ease-with-apache-spark

Series follows learning from Apache Spark (PySpark) with quick tips and workaround for daily problems in hand

☆55

Alternatives and similar repositories for ease-with-apache-spark

Users that are interested in ease-with-apache-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

subhamkharwal / docker-images
View on GitHub
Public Docker Images for popular services
☆56Sep 7, 2025Updated 10 months ago
airscholar / modern-data-eng-dbt-databricks-azure
View on GitHub
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …
☆38Dec 18, 2023Updated 2 years ago
airscholar / cicd_for_data_engineering
View on GitHub
This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with …
☆14Dec 27, 2023Updated 2 years ago
Kushalkhadka7 / dagster_clickhouse_dbt
View on GitHub
DBT and clickhouse test project with dagster
☆12Aug 29, 2023Updated 2 years ago
SatadruMukherjee / Data-Preprocessing-Models
View on GitHub
☆69Jun 21, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Rajsingh92 / MUST_HAVE_SKILLS
View on GitHub
This repo consists of all important concepts for data engineers.
☆11Jun 2, 2026Updated last month
CongHieuTruong / scroll-follow-tab
View on GitHub
🚀 Scroll Follow Tab is a lightweight javascript library without jQuery, no dependencies. It is used to make scrollspy effect for your me…
☆16May 9, 2024Updated 2 years ago
kiranskmr / workflows_automation
View on GitHub
Automation of Databricks workflows
☆13Nov 9, 2025Updated 8 months ago
ome / awstats-docker
View on GitHub
Run Awstats in Docker
☆13Jul 18, 2019Updated 7 years ago
dogukannulu / streaming_data_processing
View on GitHub
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
☆66Jul 21, 2023Updated 3 years ago
JChz6 / DE-ZCamp-Project
View on GitHub
☆15Mar 29, 2024Updated 2 years ago
raveendratal / ravi_azureadbadf
View on GitHub
Ravi Azure ADB ADF Repository
☆65Jan 25, 2025Updated last year
mrn-aglic / apache-iceberg-data-exploration
View on GitHub
☆23Feb 5, 2024Updated 2 years ago
sayakpaul / Emotion-Detection-using-Deep-Learning
View on GitHub
This project demonstrates the use of Deep Learning to detect emotion (sad, angry, happy etc) from the images of faces.
☆11Feb 14, 2020Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
PacktPublishing / Simplify-Big-Data-Analytics-with-Amazon-EMR-
View on GitHub
Simplify Big Data Analytics with Amazon EMR, published by Packt
☆13Jan 18, 2023Updated 3 years ago
gutfeeling / twitass
View on GitHub
Scrapes tweets from the Twitter Advanced Search webpage - bypasses the 7 day historical limit of the public API
☆14Aug 31, 2017Updated 8 years ago
azavea / hot-osm-population
View on GitHub
Estimate OSM building coverage completeness by comparing vs WorldPop raster
☆12Nov 16, 2018Updated 7 years ago
SomanathSankaran / spark_medium
View on GitHub
My Git Repo for Csv Data
☆21Oct 5, 2025Updated 9 months ago
oracle-samples / sample-serverless-saas-erp-dataload
View on GitHub
Sample code demonstrating how you can use Oracle Cloud Infrastructure serverless components to load data into Oracle Fusion ERP
☆13Aug 24, 2023Updated 2 years ago
gbrueckl / Fabric.Toolbox
View on GitHub
Tools for Microsoft Fabric
☆26Jun 26, 2026Updated last month
simardeep1792 / Data-Engineering-Streaming-Project
View on GitHub
☆45Jul 6, 2024Updated 2 years ago
xxsacxx / ML_algos
View on GitHub
Repo will try to cover all the most frequently used ML algos with proper explanation and examples
☆10Apr 14, 2019Updated 7 years ago
HoussemDellai / azure-devops-pipelines-samples
View on GitHub
Demoing how to use Matrix and Each definitions in Azure DevOps YAML pipelines.
☆20Apr 1, 2026Updated 3 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
airscholar / e2e-data-engineering
View on GitHub
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…
☆337Feb 14, 2025Updated last year
aws-samples / amazon-textract-queries-example
View on GitHub
☆21Apr 13, 2026Updated 3 months ago
juanludataanalyst / langgraph-conversational-patterns
View on GitHub
☆20Nov 12, 2025Updated 8 months ago
airscholar / Kubernetes-For-DataEngineering
View on GitHub
This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering en…
☆25Jan 26, 2024Updated 2 years ago
lbrack1 / kafka-tutorial
View on GitHub
This guide will demonstrate how to deploy a minimal Apache Kafka cluster on Docker and set up producers and consumers using Python. We wi…
☆18Nov 15, 2020Updated 5 years ago
itversity / ghactivity-aws
View on GitHub
End to End Pipeline using AWS Services such as s3, boto3, lambda, ECR, step functions, Dynamodb, Step Functions, etc
☆23Jul 31, 2022Updated 3 years ago
idevloping / PyQt5_Matplotlib
View on GitHub
Making Desktop App with PyQt5 and matplotlib
☆16Dec 19, 2019Updated 6 years ago
martandsingh / ApacheSpark
View on GitHub
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…
☆105Sep 26, 2025Updated 10 months ago
abdkumar / spotify-stream-analytics
View on GitHub
Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consu…
☆72Dec 17, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
aws-samples / data-engineering-on-aws
View on GitHub
☆22Oct 21, 2024Updated last year
akuramshin / Follow-Me-Robot
View on GitHub
A project for an autonomous robot that follow you (your smartphone).
☆11Jan 29, 2021Updated 5 years ago
ashishpatel26 / Kubeflow-installation-on-windows-10
View on GitHub
Kubeflow installation on windows 10/11
☆17Dec 26, 2022Updated 3 years ago
sharmi1206 / covid-19-analysis
View on GitHub
Covid-19 India's statewide analysis with census data 2011 and Kaggle data
☆16Sep 20, 2020Updated 5 years ago
sspaeti / data-engineer-handbook
View on GitHub
This is a repo with links to everything you'd ever want to learn about data engineering
☆12Dec 3, 2024Updated last year
dain55788 / ELT-Data-Pipeline
View on GitHub
ELT Data Pipeline implementation in Data Warehousing environment
☆31May 2, 2025Updated last year
AnandDedha / aws-airflow-dataengineering-pipeline
View on GitHub
☆21Jan 13, 2024Updated 2 years ago