techmonad/spark-data-pipeline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/techmonad/spark-data-pipeline)

techmonad / spark-data-pipeline

This project describes how to write full ETL data pipeline using spark.

☆15

Alternatives and similar repositories for spark-data-pipeline

Users that are interested in spark-data-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vbounyasit / MyDataFramework
View on GitHub
An ETL framework in Scala for Data Engineers
☆23Aug 30, 2022Updated 3 years ago
hortonworks-spark / spark-hive-streaming-sink
View on GitHub
A sink to save Spark Structured Streaming DataFrame into Hive table
☆23May 7, 2018Updated 8 years ago
Hamza88-coder / Real-Time-Recruitment-System-with-AI-and-Data-Analytics
View on GitHub
Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includ…
☆14Dec 25, 2024Updated last year
airbnb / sputnik
View on GitHub
☆64Nov 8, 2019Updated 6 years ago
randerzander / HiveToPhoenix
View on GitHub
An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase
☆14Mar 23, 2016Updated 10 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lensesio / fast-data-connect-cluster
View on GitHub
Create Kafka-Connect clusters with docker . You put the Kafka, we put the Connect.
☆25Mar 27, 2019Updated 7 years ago
hammerlab / spark-util
View on GitHub
low-level helpers for Apache Spark libraries and tests
☆16Dec 29, 2018Updated 7 years ago
milinda / KafkaOnEC2
View on GitHub
Ansible scripts for deploying Kafka on EC2
☆10Oct 7, 2016Updated 9 years ago
sanjuthomas / kafka-connect-orientdb
View on GitHub
Kafka Sink Connect OrientDB https://www.confluent.io/hub/sanjuthomas/kafka-connect-orientdb
☆10Jan 26, 2026Updated 5 months ago
jcustenborder / kafka-connect-cdc-postgres
View on GitHub
Kafka Connect connector for CDC data from postgres
☆11Aug 27, 2017Updated 8 years ago
kashizui / Stanford-CS109-Notes
View on GitHub
Comprehensive typeset notes for Stanford's CS 109 probability course.
☆12Jun 24, 2015Updated 11 years ago
zhanghua19830528 / cim-basic-platform-backend
View on GitHub
CIM基础开发平台后端基于若依框架 BIM+GIS
☆11May 25, 2022Updated 4 years ago
guedim / postgres-kafka-elastic
View on GitHub
Docker example with kafka connect and sink
☆12Feb 12, 2018Updated 8 years ago
hpgrahsl / wearedevs-2018
View on GitHub
Code for my talk "Stateful & Reactive Streaming Applications Without a Database" at WeAreDevelopers 2018
☆11May 20, 2018Updated 8 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
rc-dukes / dash2
View on GitHub
Real-time motion planner and autonomous vehicle simulator in the browser, built with WebGL and Three.js.
☆13Jun 25, 2026Updated last month
trulia / thoth-ml
View on GitHub
☆15Jan 3, 2015Updated 11 years ago
trulia / node-optimizely
View on GitHub
Runs optimizely experiments in node using either jsdom (slow & stable) or cheerio+node-vm (young blood)
☆15Sep 1, 2017Updated 8 years ago
JerckyLY / mapboxgl-measure-tools
View on GitHub
基于mapboxgl、mapboxgl-draw、turf测量控件
☆12Nov 22, 2022Updated 3 years ago
prospa-group-oss / interview-test-data-engineer
View on GitHub
☆11Jul 13, 2020Updated 6 years ago
sidneyocirqueira / azure-synapse-analytics
View on GitHub
☆12Mar 15, 2022Updated 4 years ago
charlesb / CDF-workshop
View on GitHub
Leveraging Hortonworks' HDP 3.1.0 and HDF 3.4.0 components, this tutorial guides the user through steps to stream data from a REST API in…
☆19Aug 16, 2019Updated 6 years ago
candrsn / gis-gltf
View on GitHub
Utilities to convert between GIS (multipolygon/multipatch shapefiles) and glTF and b3dm formats
☆13Mar 13, 2017Updated 9 years ago
NICTA / nicta-ner
View on GitHub
NICTA Named Entity Recogniser is a rule based Named Entity Recogniser which extracts named entities from text such as Organisation, Locat…
☆16Apr 15, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
watergis / postgis2mbtiles
View on GitHub
A module extracting the data from PostGIS to mbtiles by using tippecanoe.
☆16Jan 31, 2026Updated 5 months ago
hadooparchitecturebook / clickstream-tutorial
View on GitHub
Code for Tutorial on designing clickstream analytics application using Hadoop
☆54May 20, 2015Updated 11 years ago
AasTrailblazers / AzureSynapse
View on GitHub
☆15Jan 17, 2022Updated 4 years ago
v-shinc / KBQA
View on GitHub
KBQA
☆14Mar 13, 2017Updated 9 years ago
cszhangyi / NewsApp
View on GitHub
NewsApp包含客户端源码、服务端源码、数据库文件。基于Miscrosoft人工智能项目ProjectOxford中的Recognition Emotion做的，主要是基于用户的面部表情来推送不同类别的新闻。 Emotion API可以参考：https://www.p…
☆10Mar 2, 2016Updated 10 years ago
mengxr / spark-als
View on GitHub
Another, hopefully better, implementation of ALS on Spark
☆14May 20, 2015Updated 11 years ago
haidaoxiaofei / GPS2RoadNetwork
View on GitHub
Reading list of the topic about utilizing vehicle generated GPS data to update road networks
☆14Jul 18, 2018Updated 8 years ago
gianlucahmd / deeplearning_andrewNG_notes
View on GitHub
Course notes for Andrew NG's Deep Learning course on Coursera
☆13Aug 12, 2017Updated 8 years ago
AiAnonymousPT / COURSE-LLMs-from-Scratch
View on GitHub
Collaborative course building Large Language Models from scratch
☆12Apr 2, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
meltano / jaffle-shop-template
View on GitHub
Template for a DuckDB-based, Codespace-oriented sandbox project that is also dbt Cloud compatible, and includes code-first BI tooling via…
☆17Apr 7, 2023Updated 3 years ago
bigsnarfdude / guide-to-data-mining
View on GitHub
iPython Notebook of the Guide to Data Mining
☆20Apr 7, 2013Updated 13 years ago
golang-vietnam / gophercon-2018
View on GitHub
For organization discussion and materials
☆13Dec 10, 2018Updated 7 years ago
rsethur / SparkStreamingGeoFencing
View on GitHub
Realtime Geofencing using Spark streaming for vehicle tracking / fleet management usecase
☆12Jul 22, 2019Updated 7 years ago
leftnoteasy / pymining
View on GitHub
python data mining platform
☆16Jan 17, 2013Updated 13 years ago
piotr-kalanski / data-quality-monitoring
View on GitHub
Data Quality Monitoring Tool
☆15Dec 5, 2017Updated 8 years ago
Cascading / cascading.samples
View on GitHub
Sample applications using Cascading
☆20Jun 7, 2015Updated 11 years ago