This project describes how to write full ETL data pipeline using spark.
☆15Oct 15, 2022Updated 3 years ago
Alternatives and similar repositories for spark-data-pipeline
Users that are interested in spark-data-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Kafka Connect connector for receiving data and writing data to Splunk.☆25Nov 7, 2017Updated 8 years ago
- This is an activator project for showcasing how to read & write data from Kafka-cluster using Scala Producer & Consumer API.☆11May 28, 2017Updated 8 years ago
- This is an activator project for showcasing how to read & write data from Kafka-cluster using Java Producer & Consumer API.☆11May 24, 2017Updated 8 years ago
- This project have the sample programs for the Azure Databricks technical enablement workshop!☆12Jul 25, 2019Updated 6 years ago
- Simple Spark example of generating table stats for use of data quality checks☆27Apr 28, 2017Updated 9 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.☆40Aug 31, 2016Updated 9 years ago
- ☆63Nov 8, 2019Updated 6 years ago
- This is an activator project for showcasing best practices, writing unit test and providing a seed for starting with Slick.☆13May 28, 2017Updated 8 years ago
- This is an activator project providing a seed for starting with Play & Slick using AngularJS☆14May 24, 2017Updated 8 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Mar 23, 2016Updated 10 years ago
- Create Kafka-Connect clusters with docker . You put the Kafka, we put the Connect.☆25Mar 27, 2019Updated 7 years ago
- Ansible scripts for deploying Kafka on EC2☆10Oct 7, 2016Updated 9 years ago
- Kafka Sink Connect OrientDB https://www.confluent.io/hub/sanjuthomas/kafka-connect-orientdb☆10Jan 26, 2026Updated 3 months ago
- Kafka Connect connector for CDC data from postgres☆11Aug 27, 2017Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Real-time motion planner and autonomous vehicle simulator in the browser, built with WebGL and Three.js.☆13Mar 3, 2023Updated 3 years ago
- With this library, you can embed Python to your Java or Scala project. The main purpose of this library is to use Python libraries from J…☆12Aug 25, 2024Updated last year
- Docker example with kafka connect and sink☆12Feb 12, 2018Updated 8 years ago
- Code for my talk "Stateful & Reactive Streaming Applications Without a Database" at WeAreDevelopers 2018☆11May 20, 2018Updated 7 years ago
- ☆20Apr 27, 2012Updated 14 years ago
- 🎁 Shows recommended files in Nextcloud☆15Updated this week
- ☆14Aug 22, 2025Updated 8 months ago
- 观点型问题阅读理解 challenger.ai☆10Nov 14, 2018Updated 7 years ago
- Template for a DuckDB-based, Codespace-oriented sandbox project that is also dbt Cloud compatible, and includes code-first BI tooling via…☆17Apr 7, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Materials for various Hadoop & Nifi related workshops☆51Mar 20, 2019Updated 7 years ago
- A description of the processes and techniques required to migrate a relational schema to a Cassandra database using Spark and SparkSQL☆11Jan 27, 2018Updated 8 years ago
- 基于mapboxgl、mapboxgl-draw、turf测量控件☆12Nov 22, 2022Updated 3 years ago
- Event ticketing system with Next.js and Appwrite☆10Jun 22, 2023Updated 2 years ago
- ☆12Mar 15, 2022Updated 4 years ago
- iPython Notebook of the Guide to Data Mining☆20Apr 7, 2013Updated 13 years ago
- A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.☆12Apr 1, 2017Updated 9 years ago
- CIM基础开发平台后端 基于若依框架 BIM+GIS☆11May 25, 2022Updated 3 years ago
- Leveraging Hortonworks' HDP 3.1.0 and HDF 3.4.0 components, this tutorial guides the user through steps to stream data from a REST API in…☆19Aug 16, 2019Updated 6 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A module extracting the data from PostGIS to mbtiles by using tippecanoe.☆16Jan 31, 2026Updated 3 months ago
- ☆11Jul 13, 2020Updated 5 years ago
- Context-aware AI dictionary for books, manga & comics. Neural TTS (Piper), IPA generation, PaddleOCR, multi-word lookup. Supports cloud &…☆19Apr 20, 2026Updated 2 weeks ago
- Legoo: A collection of automation modules to build analytics infrastructure☆20Jul 24, 2020Updated 5 years ago
- Code for Tutorial on designing clickstream analytics application using Hadoop☆55May 20, 2015Updated 10 years ago
- NewsApp包含客户端源码、服务端源码、数据库文件。 基于Miscrosoft人工智能项目ProjectOxford中的Recognition Emotion做的, 主要是基于用户的面部表情来推送不同类别的新闻。 Emotion API可以参考:https://www.p…☆10Mar 2, 2016Updated 10 years ago
- Spark stream from kafka(json) to s3(parquet)☆15Nov 8, 2018Updated 7 years ago