snowplow-archive/spark-example-project

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/snowplow-archive/spark-example-project)

snowplow-archive / spark-example-project

A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR

☆119

Alternatives and similar repositories for spark-example-project

Users that are interested in spark-example-project are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

snowplow-archive / spark-streaming-example-project
View on GitHub
A Spark Streaming job reading events from Amazon Kinesis and writing event counts to DynamoDB
☆94Oct 1, 2020Updated 5 years ago
tdas / spark-streaming-benchmark
View on GitHub
☆11Aug 14, 2014Updated 11 years ago
tuplejump / calliope
View on GitHub
Calliope is a library integrating Cassandra and Spark framework.
☆27May 1, 2015Updated 11 years ago
melphi / spark-examples
View on GitHub
Spark examples
☆40May 7, 2024Updated 2 years ago
plaa / mongo-spark
View on GitHub
Example application on how to use mongo-hadoop connector with Spark
☆90Feb 18, 2014Updated 12 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
googlegenomics / spark-examples
View on GitHub
Apache Spark jobs such as Principal Coordinate Analysis.
☆77Jan 30, 2017Updated 9 years ago
criteo / vizsql
View on GitHub
Scala and SQL happy together.
☆29Dec 13, 2016Updated 9 years ago
aws-samples / emr-bootstrap-actions
View on GitHub
This repository hold the Amazon Elastic MapReduce sample bootstrap actions
☆613Jun 5, 2023Updated 3 years ago
DuchessFrance / spark-in-practice-scala
View on GitHub
Play with the Spark, Spark streaming and DataFrame API.
☆12Jun 26, 2015Updated 11 years ago
VeritoneAlpha / spark-job-rest
View on GitHub
☆33Jan 9, 2016Updated 10 years ago
seglo / learning-spark
View on GitHub
Practical examples of using Apache Spark in several different use cases
☆103Jun 29, 2016Updated 10 years ago
hakanilter / aws-emr-examples
View on GitHub
Some AWS EMR examples
☆16Jan 18, 2018Updated 8 years ago
InsightDataScience / kafka-streams-examples
View on GitHub
☆14Jun 27, 2017Updated 9 years ago
flydata / redshift-benchmark
View on GitHub
☆52Jan 28, 2014Updated 12 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
JerryLead / SparkLearning
View on GitHub
Learning to write Spark examples
☆159Aug 20, 2014Updated 11 years ago
databricks / spark-pr-dashboard
View on GitHub
Dashboard to aid in Spark pull request reviews
☆55Mar 30, 2023Updated 3 years ago
spirom / LearningSpark
View on GitHub
Scala examples for learning to use Spark
☆442Sep 17, 2020Updated 5 years ago
databricks / reference-apps
View on GitHub
Spark reference applications
☆649Oct 3, 2024Updated last year
VeritoneAlpha / jaws-spark-sql-rest
View on GitHub
☆91Apr 17, 2017Updated 9 years ago
velvia / spark-sql-gdelt
View on GitHub
Scripts and code to import the GDELT dataset into Spark SQL for analysis
☆17Aug 29, 2014Updated 11 years ago
hammerlab / grafana-spark-dashboards
View on GitHub
Scripts for generating Grafana dashboards for monitoring Spark jobs
☆239Mar 26, 2015Updated 11 years ago
databricks / simr
View on GitHub
Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure
☆44Mar 9, 2022Updated 4 years ago
evancasey / spark-knn-recommender
View on GitHub
Item and User-based KNN recommendation algorithms using PySpark
☆124Nov 14, 2017Updated 8 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
databricks / spark-integration-tests
View on GitHub
Integration tests for Spark
☆67May 20, 2023Updated 3 years ago
snowplow / dataflow-runner
View on GitHub
Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
☆19Jun 8, 2026Updated last month
helena / spark-cassandra
View on GitHub
An Akka Extension for easy integration of spark and cassandra in Akka micro services.
☆24Sep 25, 2014Updated 11 years ago
gwenshap / SparkStreamingExample
View on GitHub
☆55Aug 21, 2014Updated 11 years ago
gbraccialli / SparkUtils
View on GitHub
☆11Dec 10, 2015Updated 10 years ago
massie / spark-parquet-example
View on GitHub
Example project to show how to use Spark to read and write Avro/Parquet files
☆50Aug 21, 2013Updated 12 years ago
OopsOutOfMemory / spark-sql-hbase
View on GitHub
A Spark SQL HBase connector
☆29May 4, 2015Updated 11 years ago
dipanjanS / BerkeleyX-CS190.1x-Scalable-Machine-Learning
View on GitHub
This repository contains code files specifically IPython notebooks for the assignments in the course "Scalable Machine Learning" by UC Be…
☆31Jul 12, 2015Updated 11 years ago
markmo / featurestore
View on GitHub
Building blocks and patterns for building data prep transformations and feature engineering in Spark.
☆16Mar 16, 2016Updated 10 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
pk11 / rxnetty-router
View on GitHub
A tiny HTTP Router for RxNetty
☆18Apr 20, 2016Updated 10 years ago
metamx / tranquility
View on GitHub
Tranquility helps you send real-time event streams to Druid and handles partitioning, replication, service discovery, and schema rollover…
☆13May 3, 2019Updated 7 years ago
zeridon / aws-ssh-scp-connector
View on GitHub
A bash wrapper to help you connect to your instances
☆15May 20, 2016Updated 10 years ago
tubular / confluent-spark-avro
View on GitHub
Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
☆20Jan 11, 2018Updated 8 years ago
moertel / sQucumber-redshift
View on GitHub
Cucumber-based framework for defining and executing SQL unit, integration and acceptance tests (for AWS Redshift, PostgreSQL)
☆13Sep 30, 2020Updated 5 years ago
hubertp / prefuse-type-debugger
View on GitHub
Type debugger that is using logging infrastructure of the scala compiler to gather information and prefuse library for UI.
☆17Aug 8, 2012Updated 13 years ago
techmonad / spark-data-pipeline
View on GitHub
This project describes how to write full ETL data pipeline using spark.
☆15Oct 15, 2022Updated 3 years ago