This project describes how to write full ETL data pipeline using spark.
☆15Oct 15, 2022Updated 3 years ago
Alternatives and similar repositories for spark-data-pipeline
Users that are interested in spark-data-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Kafka Connect connector for receiving data and writing data to Splunk.☆25Nov 7, 2017Updated 8 years ago
- This is an activator project for showcasing how to read & write data from Kafka-cluster using Scala Producer & Consumer API.☆11May 28, 2017Updated 8 years ago
- Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and tr…☆12May 25, 2023Updated 2 years ago
- A sink to save Spark Structured Streaming DataFrame into Hive table☆23May 7, 2018Updated 7 years ago
- This is an activator project for showcasing how to read & write data from Kafka-cluster using Java Producer & Consumer API.☆11May 24, 2017Updated 8 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆14Jan 1, 2020Updated 6 years ago
- Simple Spark example of generating table stats for use of data quality checks☆28Apr 28, 2017Updated 8 years ago
- This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.☆40Aug 31, 2016Updated 9 years ago
- This is an activator project for providing a seed for starting with Akka-Http and Slick.☆14May 28, 2017Updated 8 years ago
- My documents for self-learning fundamental of Data engineering skills☆14Aug 5, 2023Updated 2 years ago
- Creating Data Pipelines with Apache Airflow to manage ETL from Amazon S3 into Amazon Redshift☆14Jun 12, 2019Updated 6 years ago
- ☆63Nov 8, 2019Updated 6 years ago
- This is an activator project for showcasing best practices, writing unit test and providing a seed for starting with Slick.☆13May 28, 2017Updated 8 years ago
- This is a Play activator project. It's describe how to build autocomplete search on the Elasticsearch.☆15May 24, 2017Updated 8 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- This project provides valuable customer sentiment insights for Zomato by tracking and analyzing tweets related to their brand and service…☆14Aug 27, 2023Updated 2 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Mar 23, 2016Updated 10 years ago
- low-level helpers for Apache Spark libraries and tests☆16Dec 29, 2018Updated 7 years ago
- A database with automatic dynamic imputation of missing values.☆11Nov 2, 2017Updated 8 years ago
- Kafka Connect connector for CDC data from postgres☆11Aug 27, 2017Updated 8 years ago
- Real-time motion planner and autonomous vehicle simulator in the browser, built with WebGL and Three.js.☆13Mar 3, 2023Updated 3 years ago
- With this library, you can embed Python to your Java or Scala project. The main purpose of this library is to use Python libraries from J…☆12Aug 25, 2024Updated last year
- Docker example with kafka connect and sink☆12Feb 12, 2018Updated 8 years ago
- Code for my talk "Stateful & Reactive Streaming Applications Without a Database" at WeAreDevelopers 2018☆11May 20, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- For this project I am creating an ETL (Extract, Transform, and Load) pipeline using Python, RegEx, and SQL Database. The goal is to retri…☆26Feb 9, 2021Updated 5 years ago
- ☆14Aug 22, 2025Updated 7 months ago
- 观点型问题阅读理解 challenger.ai☆10Nov 14, 2018Updated 7 years ago
- Materials for various Hadoop & Nifi related workshops☆52Mar 20, 2019Updated 7 years ago
- https://github.com/uavorg/uavstack☆10Sep 11, 2017Updated 8 years ago
- Spark Streaming与OpenCV传感器数据实时获取☆13Jun 20, 2016Updated 9 years ago
- 基于mapboxgl、mapboxgl-draw、turf测量控件☆12Nov 22, 2022Updated 3 years ago
- free bike for everyone☆15Aug 20, 2019Updated 6 years ago
- ☆12Mar 15, 2022Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.☆12Apr 1, 2017Updated 9 years ago
- NICTA Named Entity Recogniser is a rule based Named Entity Recogniser which extracts named entities from text such as Organisation, Locat…☆16Apr 15, 2023Updated 3 years ago
- Leveraging Hortonworks' HDP 3.1.0 and HDF 3.4.0 components, this tutorial guides the user through steps to stream data from a REST API in…☆19Aug 16, 2019Updated 6 years ago
- Apartments Data Pipeline using Airflow and Spark.☆24Mar 28, 2022Updated 4 years ago
- Utilities to convert between GIS (multipolygon/multipatch shapefiles) and glTF and b3dm formats☆13Mar 13, 2017Updated 9 years ago
- ☆11Jul 13, 2020Updated 5 years ago
- Context-aware AI dictionary for books, manga & comics. Neural TTS (Piper), IPA generation, PaddleOCR, multi-word lookup. Supports cloud &…☆19Apr 7, 2026Updated last week