collectivemedia/spark-hyperloglog

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/collectivemedia/spark-hyperloglog)

collectivemedia / spark-hyperloglog

Interactive Audience Analytics with Spark and HyperLogLog

☆55

Alternatives and similar repositories for spark-hyperloglog

Users that are interested in spark-hyperloglog are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AdRoll / cantor
View on GitHub
Cantor provides utilities for estimating the cardinality of large sets.
☆85Apr 12, 2022Updated 4 years ago
collectivemedia / spark-ext
View on GitHub
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
☆145Jan 26, 2016Updated 10 years ago
memsql / streamliner-starter
View on GitHub
Starter project for building MemSQL Streamliner Pipelines
☆32Apr 18, 2017Updated 9 years ago
velvia / cassandra-gdelt
View on GitHub
Experiments with the GDELT dataset and Cassandra schemas.
☆25Feb 9, 2016Updated 10 years ago
tuplejump / embedded-kafka
View on GitHub
Embedded Kafka for testing and quick prototyping.
☆14Apr 19, 2016Updated 10 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
holdenk / fastdataprocessingwithsparkexamples
View on GitHub
Examples for Fast Data Processing with Spark
☆59Sep 10, 2013Updated 12 years ago
tuplejump / snackfs
View on GitHub
HDFS compatible Distributed Filesystem backed Cassandra
☆25Sep 17, 2015Updated 10 years ago
zinniasystems / spark-ml-class
View on GitHub
Coursera Machine Learning class examples in Spark
☆42Feb 14, 2014Updated 12 years ago
scalding-io / social-media-analytics
View on GitHub
Social Media Data Mining and Analytics - HyperLogLog, BloomFilter and CountMinSketch with Scalding & Algebird
☆27Oct 6, 2018Updated 7 years ago
GlobalWebIndex / ember-clothier
View on GitHub
Decorators/State & View Models for Ember.js applications
☆11Sep 9, 2016Updated 9 years ago
mrsqueeze / spark-hash
View on GitHub
Locality Sensitive Hashing for Apache Spark
☆198Nov 1, 2016Updated 9 years ago
clamm / spark-location-history
View on GitHub
Application that visualizes your google location history in form of a heatmap using Spark to aggregate the data.
☆12Feb 19, 2015Updated 11 years ago
databricks / simr
View on GitHub
Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure
☆44Mar 9, 2022Updated 4 years ago
bitphy / argo-cron
View on GitHub
argo-cron
☆14Feb 17, 2020Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
broxtronix / spark-gce
View on GitHub
A tool for running Spark on Google Compute Engine
☆16Jan 20, 2017Updated 9 years ago
r-spark / sparkhello
View on GitHub
sparkhello: Scala to Spark - Hello World
☆19Jul 12, 2017Updated 9 years ago
pwendell / spark-twitter-collection
View on GitHub
Spark example of collecting tweets and loading into HDFS/S3
☆42Oct 2, 2013Updated 12 years ago
indigo-dc / ansible-role-hadoop
View on GitHub
Ansible Role to install a Hadoop Cluster
☆10Sep 21, 2020Updated 5 years ago
tresata / spark-skewjoin
View on GitHub
Joins for skewed datasets in Spark
☆58Aug 18, 2017Updated 8 years ago
torquebox / torquespec
View on GitHub
Integration testing with TorqueBox
☆17Apr 8, 2015Updated 11 years ago
metamx / scala-util
View on GitHub
Scala stuff
☆18Jun 13, 2019Updated 7 years ago
dbist / KafkaHBaseBenchmark
View on GitHub
☆11Oct 8, 2015Updated 10 years ago
eigengo / phillyete2014
View on GitHub
Philly ETE Reactive APIs talk
☆17Aug 26, 2015Updated 10 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
bellettif / sparkGeoTS
View on GitHub
☆12Apr 8, 2016Updated 10 years ago
collectivemedia / celos
View on GitHub
Scriptable scheduler for periodical Hadoop workflows
☆22Feb 1, 2018Updated 8 years ago
matthayes / azkaban-rb
View on GitHub
A Ruby DSL for creating Azkaban jobs using Rake
☆18Jul 26, 2013Updated 12 years ago
punya / spark-gradle-test-example
View on GitHub
Example demonstrating a Scala project that builds using Gradle, produces a shadow jar suitable for spark-submit, and has tests using Scal…
☆18Jun 18, 2015Updated 11 years ago
sheepkiller / presto-marathon-docker
View on GitHub
On demand presto cluster with mesos, marathon and docker.
☆29Mar 7, 2018Updated 8 years ago
tmalaska / SparkOnKudu
View on GitHub
Based off the design of SparkOnHBase. This Repo will support Spark, Spark Streaming, and Spark SQL integration with Kudu.
☆50May 19, 2016Updated 10 years ago
twitter / algebird
View on GitHub
Abstract Algebra for Scala
☆2,299Nov 21, 2025Updated 8 months ago
gerritjvv / fileape
View on GitHub
Write data to files split by topic and rolled over on size or a timeout, files can be compressed using lzo, snappy or gzip
☆11Jul 12, 2021Updated 5 years ago
dmatrix / examples
View on GitHub
These are some code examples
☆56Jan 12, 2020Updated 6 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
massie / spark-parquet-example
View on GitHub
Example project to show how to use Spark to read and write Avro/Parquet files
☆50Aug 21, 2013Updated 12 years ago
dkuppitz / titan-docker
View on GitHub
☆19Dec 3, 2014Updated 11 years ago
mhausenblas / elsa
View on GitHub
Elastic Sentiment Analysis (using Apache Mesos, Marathon and Apache Spark)
☆35Mar 16, 2015Updated 11 years ago
databricks / spark-tfocs
View on GitHub
A Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)
☆90Apr 15, 2024Updated 2 years ago
collectivemedia / modelmatrix
View on GitHub
Sparse feature extraction with Spark
☆30Jul 25, 2018Updated 7 years ago
adobe-research / spark-cluster-deployment
View on GitHub
Automates Spark standalone cluster tasks with Puppet and Fabric.
☆43Aug 14, 2014Updated 11 years ago
tresata / spark-sorted
View on GitHub
Secondary sort and streaming reduce for Apache Spark
☆77Jul 3, 2023Updated 3 years ago