tresata/spark-skewjoin

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tresata/spark-skewjoin)

tresata / spark-skewjoin

Joins for skewed datasets in Spark

☆58

Alternatives and similar repositories for spark-skewjoin

Users that are interested in spark-skewjoin are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tresata / spark-sorted
View on GitHub
Secondary sort and streaming reduce for Apache Spark
☆77Jul 3, 2023Updated 3 years ago
radanalyticsio / silex
View on GitHub
something to help you spark
☆65Oct 23, 2018Updated 7 years ago
bbejeck / spark-experiments
View on GitHub
Repo with sources for Spark blog posts and learning experiments in Spark
☆14Oct 16, 2015Updated 10 years ago
syoummer / SpatialSpark
View on GitHub
Big Spatial Data Processing using Spark
☆146Mar 7, 2017Updated 9 years ago
bgnkim / ScalaNetwork
View on GitHub
A Neural network implementation with Scala
☆20Jul 17, 2016Updated 10 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
alteryx / sparkGLM
View on GitHub
An R-like GLM package for Apache Spark
☆10Aug 6, 2015Updated 10 years ago
drubbo / SparkGIS
View on GitHub
GIS extension for SparkSQL
☆40Jan 25, 2016Updated 10 years ago
intenthq / pucket
View on GitHub
Bucketing and partitioning system for Parquet
☆30May 22, 2018Updated 8 years ago
mkubala / typesafe-config-examples
View on GitHub
A few, straightforward examples which shows how to use Typesafe's Config library and HOCON.
☆10Oct 9, 2013Updated 12 years ago
springnz / sparkplug
View on GitHub
A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.
☆47Aug 1, 2016Updated 9 years ago
gingsmith / proxcocoa
View on GitHub
A primal-dual framework for distributed L1-regularized optimization
☆37Apr 18, 2016Updated 10 years ago
collectivemedia / spark-hyperloglog
View on GitHub
Interactive Audience Analytics with Spark and HyperLogLog
☆55Oct 14, 2015Updated 10 years ago
jetoile / resteasy-netty-sample
View on GitHub
Sample of resteasy-netty project
☆17Jun 25, 2015Updated 11 years ago
BenFradet / spark-kaggle
View on GitHub
Different entries to kaggle contests using Apache Spark
☆13Jun 5, 2017Updated 9 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
amplab / SparkNet
View on GitHub
Distributed Neural Networks for Spark
☆609Jul 23, 2020Updated 5 years ago
avulanov / ann-benchmark
View on GitHub
Benchmarks of artificial neural network library for Spark MLlib
☆11Dec 3, 2015Updated 10 years ago
hammerlab / spree
View on GitHub
Live-updating Spark UI built with Meteor
☆190Apr 6, 2021Updated 5 years ago
darkjh / scalaflow
View on GitHub
Fluent Scala DSL for Google's Cloud Dataflow SDK
☆56Aug 2, 2015Updated 10 years ago
adrianulbona / borders
View on GitHub
☆17Jan 25, 2017Updated 9 years ago
ahanwadi / paxos
View on GitHub
Implementation of Paxos
☆21Apr 4, 2015Updated 11 years ago
bentaylordata / datascience
View on GitHub
Data science repo to help others
☆12Feb 10, 2016Updated 10 years ago
markt-asf / memory-leaks
View on GitHub
Sample code for demonstrating and exploring class loader related memory leaks
☆15Mar 29, 2018Updated 8 years ago
collectivemedia / modelmatrix
View on GitHub
Sparse feature extraction with Spark
☆30Jul 25, 2018Updated 7 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
AtlasPilotPuppy / SparkAlgorithms
View on GitHub
Additional useful algorithms that can be used with spark.
☆24Dec 24, 2014Updated 11 years ago
clamm / spark-location-history
View on GitHub
Application that visualizes your google location history in form of a heatmap using Spark to aggregate the data.
☆12Feb 19, 2015Updated 11 years ago
yods / storm-ml-play
View on GitHub
Experiments with VowPal Wabbit Machine Learning & Storm
☆26Apr 29, 2013Updated 13 years ago
ehsanmok / sparkling-titanic
View on GitHub
Training models with Apache Spark, PySpark for Titanic Kaggle competition
☆14Sep 23, 2016Updated 9 years ago
bloomreach / solrcloud-rebalance-api
View on GitHub
SolrCloud Rebalance API Documentation
☆13Jul 18, 2016Updated 10 years ago
akopich / dplsa
View on GitHub
Distributed implementation of Robust PLSA using Spark
☆12Apr 29, 2021Updated 5 years ago
lightingLYG / saiku3
View on GitHub
The second development version based on the branch release-3.8 of saiku.
☆25Dec 28, 2016Updated 9 years ago
amplab / spark-indexedrdd
View on GitHub
An efficient updatable key-value store for Apache Spark
☆255Mar 11, 2017Updated 9 years ago
holdenk / spark-validator
View on GitHub
A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support…
☆111Feb 1, 2018Updated 8 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
hammer / avro-tools
View on GitHub
A collection of tools that help me work with Avro
☆23Jan 7, 2010Updated 16 years ago
isarn / isarn-sketches-spark
View on GitHub
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
☆29May 27, 2024Updated 2 years ago
hammerlab / spark-bam
View on GitHub
Load genomic BAM files using Apache Spark
☆21Jun 17, 2018Updated 8 years ago
databricks / spark-perf
View on GitHub
Performance tests for Apache Spark
☆392Jul 9, 2018Updated 8 years ago
lagerspetz / TimeSeriesSpark
View on GitHub
Time series and energy data analysis API for Spark.
☆19May 1, 2012Updated 14 years ago
collectivemedia / spark-ext
View on GitHub
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
☆145Jan 26, 2016Updated 10 years ago
twitter / chill
View on GitHub
Scala extensions for the Kryo serialization library
☆618Aug 19, 2024Updated last year