This project describes how to write full ETL data pipeline using spark.
☆15Oct 15, 2022Updated 3 years ago
Alternatives and similar repositories for spark-data-pipeline
Users that are interested in spark-data-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Kafka Connect connector for receiving data and writing data to Splunk.☆25Nov 7, 2017Updated 8 years ago
- This is an activator project for showcasing how to read & write data from Kafka-cluster using Scala Producer & Consumer API.☆11May 28, 2017Updated 8 years ago
- ☆14Jan 1, 2020Updated 6 years ago
- This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.☆40Aug 31, 2016Updated 9 years ago
- This is an activator project for providing a seed for starting with Akka-Http and Slick.☆14May 28, 2017Updated 8 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This is an activator project providing a seed for starting with Play & Slick using AngularJS☆14May 24, 2017Updated 8 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Mar 23, 2016Updated 10 years ago
- low-level helpers for Apache Spark libraries and tests☆16Dec 29, 2018Updated 7 years ago
- Simulation of job offers and CVs with real-time processing, classification, and analytics using Kafka, Ray, Spark, and Databricks. Includ…☆14Dec 25, 2024Updated last year
- Create Kafka-Connect clusters with docker . You put the Kafka, we put the Connect.☆25Mar 27, 2019Updated 6 years ago
- Ansible scripts for deploying Kafka on EC2☆10Oct 7, 2016Updated 9 years ago
- ☆10May 25, 2017Updated 8 years ago
- A database with automatic dynamic imputation of missing values.☆11Nov 2, 2017Updated 8 years ago
- Kafka Connect connector for CDC data from postgres☆11Aug 27, 2017Updated 8 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Real-time motion planner and autonomous vehicle simulator in the browser, built with WebGL and Three.js.☆13Mar 3, 2023Updated 3 years ago
- Comprehensive typeset notes for Stanford's CS 109 probability course.☆12Jun 24, 2015Updated 10 years ago
- With this library, you can embed Python to your Java or Scala project. The main purpose of this library is to use Python libraries from J…☆12Aug 25, 2024Updated last year
- Docker example with kafka connect and sink☆12Feb 12, 2018Updated 8 years ago
- Code for my talk "Stateful & Reactive Streaming Applications Without a Database" at WeAreDevelopers 2018☆11May 20, 2018Updated 7 years ago
- ☆13Aug 22, 2025Updated 7 months ago
- Template for a DuckDB-based, Codespace-oriented sandbox project that is also dbt Cloud compatible, and includes code-first BI tooling via…☆16Apr 7, 2023Updated 2 years ago
- Spark Streaming与OpenCV传感器数据实时获取☆13Jun 20, 2016Updated 9 years ago
- 基于mapboxgl、mapboxgl-draw、turf测量控件☆12Nov 22, 2022Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- free bike for everyone☆15Aug 20, 2019Updated 6 years ago
- ☆12Mar 15, 2022Updated 4 years ago
- A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.☆12Apr 1, 2017Updated 8 years ago
- A module extracting the data from PostGIS to mbtiles by using tippecanoe.☆16Jan 31, 2026Updated last month
- Context-aware AI dictionary for books, manga & comics. Neural TTS (Piper), IPA generation, PaddleOCR, multi-word lookup. Supports cloud &…☆19Feb 5, 2026Updated last month
- ☆15Jan 17, 2022Updated 4 years ago
- Legoo: A collection of automation modules to build analytics infrastructure☆20Jul 24, 2020Updated 5 years ago
- Code for Tutorial on designing clickstream analytics application using Hadoop☆55May 20, 2015Updated 10 years ago
- NewsApp包含客户端源码、服务端源码、数据库文件。 基于Miscrosoft人工智能项目ProjectOxford中的Recognition Emotion做的, 主要是基于用户的面部表情来推送不同类别的新闻。 Emotion API可以参考:https://www.p…☆10Mar 2, 2016Updated 10 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Clip geographic data into MVT files based on Apache Spark☆19May 19, 2017Updated 8 years ago
- Data Quality Monitoring Tool☆15Dec 5, 2017Updated 8 years ago
- A MapReduce implementation of HashToMin for finding Connected Components in a graph.☆10May 18, 2016Updated 9 years ago
- For organization discussion and materials☆13Dec 10, 2018Updated 7 years ago
- Data Generators -> Kafka -> Spark Streaming -> PostgreSQL -> Grafana☆11Jan 31, 2023Updated 3 years ago
- This is an experimental exercise I'm using to develop a point of view on ingesting messages from IoT devices and persisting those message…☆12Apr 24, 2018Updated 7 years ago
- Kafka Connect to Hbase☆18Mar 31, 2019Updated 6 years ago