Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines
☆134Nov 4, 2022Updated 3 years ago
Alternatives and similar repositories for docker-spark
Users that are interested in docker-spark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆509Nov 7, 2025Updated 7 months ago
- Docker image for Spark history server on Kubernetes☆15Mar 13, 2020Updated 6 years ago
- A workspace to experiment with Apache Spark, Livy, and Airflow in a Docker environment.☆38Mar 29, 2021Updated 5 years ago
- Implementation of an ETL process for real-time sentiment analysis of tweets with Docker, Apache Kafka, Spark Streaming, MongoDB and Delta…☆19May 6, 2023Updated 3 years ago
- spark on kubernetes☆103Feb 20, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Apache Spark docker image☆2,051Apr 20, 2026Updated last month
- Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks☆24Apr 2, 2022Updated 4 years ago
- Kafka streaming with Spark and Flink example☆31Jul 16, 2023Updated 2 years ago
- Tools and specifications for Semantic Data Dictionaries☆13May 21, 2026Updated 3 weeks ago
- Ansible playbooks for deploying a 3 node Kubernetes cluster☆23Nov 24, 2023Updated 2 years ago
- ☆11Jul 13, 2020Updated 5 years ago
- ☆32Aug 13, 2018Updated 7 years ago
- ☆31May 13, 2025Updated last year
- This repo contains DAGs demonstrating a variety of ELT patterns using Airflow along with dbt.☆12Jan 12, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A provider package for kafka☆35Sep 16, 2025Updated 9 months ago
- ☆13Feb 3, 2026Updated 4 months ago
- Create a streaming pipeline using Kafka and Kafka Connect☆14Jun 29, 2020Updated 5 years ago
- Scrapes and Analyzes/Compares Nov 1 & Jun 7, 2015 General Election Results in Turkey☆12Nov 19, 2015Updated 10 years ago
- Use Airflow to pull in remote data via API, pub/sub, kinesis, s3 etc. and then store it in s3 for later consumption by other services.☆13Mar 14, 2022Updated 4 years ago
- Provider for AWS Redshift entities, eg Users, Groups, Permissions, Schemas, Databases☆47Mar 10, 2022Updated 4 years ago
- Use your terminal shell to do awesome things.☆15Sep 22, 2020Updated 5 years ago
- Persist Pandas objects within a MongoDB database☆14Feb 24, 2026Updated 3 months ago
- Dutch data.☆10Jun 2, 2026Updated 2 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Introduction to MLflow and Using MLflow with an Anaconda Environment☆11Sep 17, 2020Updated 5 years ago
- ☆10Dec 4, 2022Updated 3 years ago
- ☆11Aug 20, 2018Updated 7 years ago
- Backtesting.py is an open-source backtesting Python library that allows users to test their trading strategies via code.☆21Feb 18, 2024Updated 2 years ago
- ☆13May 21, 2021Updated 5 years ago
- Sync Github issues with todo.txt☆13Sep 11, 2022Updated 3 years ago
- CLI Tool for quickly loading file-based datasets into PostgreSQL/PostGIS☆12Apr 22, 2017Updated 9 years ago
- Some data science applications on the student mathematics performance data set from the 2010 KDD Cup.☆10Nov 27, 2014Updated 11 years ago
- download the esri js api☆19Dec 18, 2015Updated 10 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Distributed Data Systems with Azure Databricks, published by Packt☆12Jan 18, 2023Updated 3 years ago
- R package for prepping and analyzing DataHaven's 2019 Community Index☆12May 18, 2026Updated last month
- Docker with Airflow and Spark standalone cluster☆265Aug 5, 2023Updated 2 years ago
- CloudFormation template to create a VPC and subnets☆11Dec 2, 2021Updated 4 years ago
- Tools for extracting metadata from Tableau Desktop workbook files.☆12Mar 31, 2022Updated 4 years ago
- kernel spec, config for vanilla kernel rpms from kernel.org☆10Jan 24, 2022Updated 4 years ago
- Terraform code to create, update or delete AWS Glue dev endpoint(s)☆15Jul 23, 2019Updated 6 years ago