tresata/spark-sorted

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tresata/spark-sorted)

tresata / spark-sorted

Secondary sort and streaming reduce for Apache Spark

☆77

Alternatives and similar repositories for spark-sorted

Users that are interested in spark-sorted are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tresata / spark-skewjoin
View on GitHub
Joins for skewed datasets in Spark
☆58Aug 18, 2017Updated 8 years ago
tresata / spark-kafka
View on GitHub
Low level integration of Spark and Kafka
☆129Mar 15, 2018Updated 8 years ago
amplab / spark-indexedrdd
View on GitHub
An efficient updatable key-value store for Apache Spark
☆255Mar 11, 2017Updated 9 years ago
amplab / keystone
View on GitHub
Simplifying robust end-to-end machine learning on Apache Spark.
☆473Apr 18, 2017Updated 9 years ago
AtlasPilotPuppy / SparkAlgorithms
View on GitHub
Additional useful algorithms that can be used with spark.
☆24Dec 24, 2014Updated 11 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
radanalyticsio / silex
View on GitHub
something to help you spark
☆65Oct 23, 2018Updated 7 years ago
hammerlab / magic-rdds
View on GitHub
Miscellaneous functionality for manipulating Apache Spark RDDs.
☆22Dec 29, 2018Updated 7 years ago
WeichenXu123 / spark-ml-source-analysis
View on GitHub
spark ml 算法原理剖析以及具体的源码实现分析
☆10Jan 25, 2017Updated 9 years ago
sryza / spark-timeseries
View on GitHub
A library for time series analysis on Apache Spark
☆1,197Oct 13, 2020Updated 5 years ago
scalanlp / junto
View on GitHub
This toolkit provides an implementation of Modified Adsorption (MAD), a graph-based semi-supervised learning (SSL) algorithm.
☆24Jun 20, 2017Updated 9 years ago
hammerlab / grafana-spark-dashboards
View on GitHub
Scripts for generating Grafana dashboards for monitoring Spark jobs
☆240Mar 26, 2015Updated 11 years ago
avulanov / ann-benchmark
View on GitHub
Benchmarks of artificial neural network library for Spark MLlib
☆11Dec 3, 2015Updated 10 years ago
koeninger / kafka-exactly-once
View on GitHub
☆241Jun 14, 2018Updated 8 years ago
mengxr / spark-als
View on GitHub
Another, hopefully better, implementation of ALS on Spark
☆14May 20, 2015Updated 11 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhangyuc / splash
View on GitHub
Splash Project for parallel stochastic learning
☆93Jun 16, 2017Updated 9 years ago
BD2KGenomics / conductor
View on GitHub
Efficient, distributed downloads of large files from S3 to HDFS using Spark.
☆17Apr 26, 2017Updated 9 years ago
TrueCar / mleap
View on GitHub
MLeap allows for easily putting Spark ML pipelines into production
☆78Oct 27, 2016Updated 9 years ago
scalanlp / nak
View on GitHub
The Nak Machine Learning Library
☆342Jul 18, 2017Updated 9 years ago
googlegenomics / spark-examples
View on GitHub
Apache Spark jobs such as Principal Coordinate Analysis.
☆77Jan 30, 2017Updated 9 years ago
markhibberd / pirate
View on GitHub
Non-horrible command line parsing.
☆42May 9, 2017Updated 9 years ago
stripe-archive / herringbone
View on GitHub
Tools for working with parquet, impala, and hive
☆135Jan 4, 2021Updated 5 years ago
hseeberger / akka-log4j
View on GitHub
Logging backend for Akka based on Log4j
☆27Apr 23, 2020Updated 6 years ago
rjagerman / glintlda
View on GitHub
Scalable Distributed LDA implementation for Spark & Glint
☆29Sep 27, 2016Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
daithiocrualaoich / spark-emr
View on GitHub
Spark Elastic MapReduce bootstrap and runnable examples.
☆17Jun 26, 2013Updated 13 years ago
rkrzewski / akka-cluster-etcd
View on GitHub
Akka cluster management using etcd
☆68Apr 18, 2016Updated 10 years ago
Capgemini / dcos-cli-docker
View on GitHub
A docker image for DCOS CLI
☆14Jun 7, 2016Updated 10 years ago
collectivemedia / spark-hyperloglog
View on GitHub
Interactive Audience Analytics with Spark and HyperLogLog
☆55Oct 14, 2015Updated 10 years ago
mraad / spark-dbf
View on GitHub
Spark SQL DBF Library
☆16Jan 2, 2015Updated 11 years ago
fs111 / vagrant-hadoop-cluster
View on GitHub
Deploying apache-hadoop in a virtualized cluster as easy as 1-2-3.
☆15Jul 17, 2019Updated 7 years ago
lloydmeta / sparkka-streams
View on GitHub
Power a Spark Stream from anywhere in your Akka Stream Flow
☆12Mar 1, 2016Updated 10 years ago
karlhigley / spark-neighbors
View on GitHub
Spark-based approximate nearest neighbor search using locality-sensitive hashing
☆104Jul 5, 2016Updated 10 years ago
holdenk / spark-validator
View on GitHub
A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support…
☆111Feb 1, 2018Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
tresata / spark-columnar
View on GitHub
☆15Mar 4, 2015Updated 11 years ago
leifwickland / static-analysis-skeleton
View on GitHub
Skeleton project with static analysis provided by scalac switches, Wartremover, and Scalastyle
☆29Nov 6, 2016Updated 9 years ago
mesosphere-backup / dcos-cli-docker
View on GitHub
DCOS CLI in a Docker Container
☆10Mar 29, 2017Updated 9 years ago
med-at-scale / high-health
View on GitHub
Integrate the GA4GH schemas and probably a scala impl of the service.
☆14May 20, 2016Updated 10 years ago
d6y / enumeration-examples
View on GitHub
Demonstrates the pros and cons of scala.Enumeration and examines alternative structures
☆18Nov 24, 2016Updated 9 years ago
agrippa / spark-swat
View on GitHub
Automatic offload of user-written Spark kernels to accelerators
☆18Oct 25, 2016Updated 9 years ago
ogrodnek / spark-plug
View on GitHub
scala driver for launching Amazon EMR jobs
☆40Feb 10, 2016Updated 10 years ago