MammothData / spark-protobufLinks

☆9

Alternatives and similar repositories for spark-protobuf

Users that are interested in spark-protobuf are comparing it to the libraries listed below

Sorting:

mvogiatzis / freq-count
Lossy Counting and Sticky Sampling implementation for efficient frequency counts on data streams.
☆63Updated 9 years ago
brkyvz / streaming-matrix-factorization
Distributed Streaming Matrix Factorization implemented on Spark for Recommendation Systems
☆106Updated 9 years ago
karlhigley / spark-neighbors
Spark-based approximate nearest neighbor search using locality-sensitive hashing
☆104Updated 9 years ago
harelba / tail2kafka
Tail a log file and send log lines automatically to a kafka topic
☆57Updated 13 years ago
Sotera / correlation-approximation
Spark implementation of the Google Correlate algorithm to quickly find highly correlated vectors in huge datasets
☆93Updated 9 years ago
MLnick / glint-fm
Factorization Machines on Spark and Glint
☆25Updated 8 years ago
tribbloid / ISpark
An Apache Spark-shell backend for IPython
☆105Updated 4 years ago
sjyk / sampleclean-async
☆92Updated 9 years ago
takahi-i / likelike
An implementation of locality sensitive hashing with Hadoop
☆57Updated 10 years ago
amplab / ampcrowd
A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.
☆52Updated 8 years ago
beckgael / Mean-Shift-LSH
Scala/Spark implementation of Distributed Nearest Neighbours Mean Shift using LSH
☆30Updated 6 years ago
thunderain-project / thunderain
A Real-Time Analytical Processing (RTAP) example using Spark/Shark
☆51Updated 11 years ago
TargetHolding / pyspark-elastic
PySpark for Elastic Search
☆55Updated 8 years ago
mrsqueeze / spark-hash
Locality Sensitive Hashing for Apache Spark
☆196Updated 8 years ago
tresata / spark-sorted
Secondary sort and streaming reduce for Apache Spark
☆78Updated 2 years ago
skrusche63 / spark-piwik
Beyond Piwik Analytics with Scala and Apache Spark
☆46Updated 10 years ago
Yelp / yelp_kafka
An extension of the kafka-python package that adds features like multiprocess consumers.
☆39Updated last year
marufaytekin / lsh-spark
Locality Sensitive Hashing for Apache Spark
☆87Updated 3 years ago
crowdrec / idomaar
CrowdRec reference framework
☆32Updated 8 years ago
o19s / lazy-semantic-indexing
Elasticsearch Latent Semantic Indexing experimentation
☆33Updated 5 years ago
AirSage / Petrel
Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python
☆246Updated 2 years ago
skrusche63 / spark-fsm
This project provides sequential pattern mining for Apache Spark. The algorithms are based on the work of Philippe Fournier-Viger and co…
☆30Updated 10 years ago
ceteri / spark-exercises
Coding exercises for Apache Spark
☆104Updated 10 years ago
collectivemedia / spark-ext
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
☆147Updated 9 years ago
lucidworks / auto-phrase-tokenfilter
Lucene Auto Phrase TokenFilter implementation
☆59Updated 7 years ago
amplab / velox-modelserver
☆110Updated 8 years ago
sasha-polev / aerospark
Aerospike Spark Connector
☆35Updated 8 years ago
Aloisius / hadoop-s3a
An AWS SDK-backed FileSystem driver for Hadoop
☆64Updated 4 years ago
hbutani / spark-datetime
functionstest
☆33Updated 8 years ago
sloanahrens / qbox-blog-code
Code reference from my Qbox blog posts.
☆87Updated 10 years ago