tresata / spark-skewjoinView external linksLinks
Joins for skewed datasets in Spark
☆57Aug 18, 2017Updated 8 years ago
Alternatives and similar repositories for spark-skewjoin
Users that are interested in spark-skewjoin are comparing it to the libraries listed below
Sorting:
- Secondary sort and streaming reduce for Apache Spark☆78Jul 3, 2023Updated 2 years ago
- something to help you spark☆64Oct 23, 2018Updated 7 years ago
- Big Spatial Data Processing using Spark☆147Mar 7, 2017Updated 8 years ago
- Benchmarks of artificial neural network library for Spark MLlib☆11Dec 3, 2015Updated 10 years ago
- GIS extension for SparkSQL☆39Jan 25, 2016Updated 10 years ago
- A Neural network implementation with Scala☆20Jul 17, 2016Updated 9 years ago
- The code for the in memory data pipeline that was presented at Berlin Buzzwords 2015.☆10Jun 1, 2015Updated 10 years ago
- Reactive Factorization Engine☆104Feb 18, 2015Updated 10 years ago
- An R-like GLM package for Apache Spark☆10Aug 6, 2015Updated 10 years ago
- A few, straightforward examples which shows how to use Typesafe's Config library and HOCON.☆10Oct 9, 2013Updated 12 years ago
- The machine learning component of Open Network Insight: scalable analytics combining spark for big data and C / MPI for high performance …☆13Nov 9, 2016Updated 9 years ago
- Sparse feature extraction with Spark☆30Jul 25, 2018Updated 7 years ago
- A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.☆47Aug 1, 2016Updated 9 years ago
- Repo with sources for Spark blog posts and learning experiments in Spark☆14Oct 16, 2015Updated 10 years ago
- Scalable PCA (sPCA) is a scalable implementation of Principal component analysis algorithm on top of Spark☆12May 12, 2015Updated 10 years ago
- Additional useful algorithms that can be used with spark.☆24Dec 24, 2014Updated 11 years ago
- Bucketing and partitioning system for Parquet☆30May 22, 2018Updated 7 years ago
- Application that visualizes your google location history in form of a heatmap using Spark to aggregate the data.☆12Feb 19, 2015Updated 10 years ago
- Distributed implementation of Robust PLSA using Spark☆12Apr 29, 2021Updated 4 years ago
- code pour les billets "Refactorer Future[Option[T]]" sur☆12Jun 14, 2017Updated 8 years ago
- Omnivore Optimizer and Distributed CcT☆13Jun 17, 2016Updated 9 years ago
- ☆17Jan 25, 2017Updated 9 years ago
- Sample of resteasy-netty project☆17Jun 25, 2015Updated 10 years ago
- Fluent Scala DSL for Google's Cloud Dataflow SDK☆56Aug 2, 2015Updated 10 years ago
- Time series and energy data analysis API for Spark.☆19May 1, 2012Updated 13 years ago
- Repository for the Spark-Vector connector☆20Jul 7, 2021Updated 4 years ago
- Live-updating Spark UI built with Meteor☆189Apr 6, 2021Updated 4 years ago
- Distributed Neural Networks for Spark☆611Jul 23, 2020Updated 5 years ago
- Scala extensions for the Kryo serialization library☆618Aug 19, 2024Updated last year
- A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support…☆108Feb 1, 2018Updated 8 years ago
- An efficient updatable key-value store for Apache Spark☆254Mar 11, 2017Updated 8 years ago
- SBT project showing shading a library with SBT assembly☆15Oct 4, 2018Updated 7 years ago
- Capstan example project for Java applications☆20Sep 4, 2016Updated 9 years ago
- ☆40Feb 1, 2017Updated 9 years ago
- Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark☆146Jan 26, 2016Updated 10 years ago
- something to help you spark☆19Jul 5, 2017Updated 8 years ago
- A distributed implementation of AdaBoost.MH and MP-Boost using Apache Spark☆18Jul 7, 2016Updated 9 years ago
- DEPRECATED! Use https://github.com/h2oai/sparkling-water repository! H2O and Spark interoperability based on Tachyon.☆44Nov 25, 2014Updated 11 years ago
- Spark SQL index for Parquet tables☆134May 6, 2021Updated 4 years ago