myui/hivemall

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/myui/hivemall)

myui / hivemall

Scalable machine learning library for Apache Hive/Spark/Pig

☆501

Alternatives and similar repositories for hivemall

Users that are interested in hivemall are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

apache / incubator-hivemall
View on GitHub
Mirror of Apache Hivemall (incubating)
☆313Sep 6, 2022Updated 3 years ago
maropu / hivemall-spark
View on GitHub
A Hivemall wrapper for Spark
☆31Apr 21, 2016Updated 10 years ago
jubatus / jubatus
View on GitHub
Framework and Library for Distributed Online Machine Learning
☆708May 16, 2019Updated 7 years ago
livingsocial / HiveSwarm
View on GitHub
Helpful user defined fuctions / table generating functions for Hive
☆102May 2, 2016Updated 10 years ago
xerial / silk
View on GitHub
Simplify SQL Workflows with Scala
☆38Mar 13, 2020Updated 6 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
hbutani / SQLWindowing
View on GitHub
SQL Windowing Functions for Hadoop
☆65Jun 20, 2022Updated 4 years ago
LinkedInAttic / white-elephant
View on GitHub
Hadoop log aggregator and dashboard
☆190Oct 29, 2013Updated 12 years ago
YahooArchive / samoa
View on GitHub
SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams.
☆427Mar 28, 2016Updated 10 years ago
forcedotcom / phoenix
View on GitHub
☆558Feb 12, 2022Updated 4 years ago
amplab / shark
View on GitHub
Development in Shark has been ended.
☆992Aug 11, 2015Updated 10 years ago
twitter-archive / ambrose
View on GitHub
A platform for visualization and real-time monitoring of data workflows
☆1,170Jan 22, 2020Updated 6 years ago
twitter / elephant-bird
View on GitHub
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
☆1,134Apr 10, 2023Updated 3 years ago
facebookarchive / hive-dwrf
View on GitHub
DWRF file format for Hive
☆77Nov 8, 2018Updated 7 years ago
cdapio / tephra
View on GitHub
Apache Tephra: Transactions for HBase.
☆159Sep 13, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
adobe-research / spindle
View on GitHub
Next-generation web analytics processing with Scala, Spark, and Parquet.
☆330Mar 28, 2015Updated 11 years ago
embulk / embulk
View on GitHub
Embulk: Pluggable Bulk Data Loader.
☆1,783Jun 19, 2026Updated last month
tdunning / pig-vector
View on GitHub
Mahout vector encoding for pig
☆53Nov 20, 2022Updated 3 years ago
LinkedInAttic / datafu
View on GitHub
Hadoop library for large-scale data processing, now an Apache Incubator project
☆581Jul 8, 2014Updated 12 years ago
alienrobotwizard / varaha
View on GitHub
Machine learning and natural language processing with Apache Pig
☆53Dec 17, 2013Updated 12 years ago
collectivemedia / spark-ext
View on GitHub
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
☆145Jan 26, 2016Updated 10 years ago
apache / gobblin
View on GitHub
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…
☆2,269Updated this week
twitter / hraven
View on GitHub
hRaven collects run time data and statistics from MapReduce jobs in an easily queryable format
☆129Jan 14, 2022Updated 4 years ago
Cascading / pattern
View on GitHub
Machine Learning for Cascading
☆85Jun 12, 2015Updated 11 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
twitter / summingbird
View on GitHub
Streaming MapReduce with Scalding and Storm
☆2,123Jan 19, 2022Updated 4 years ago
sonalgoyal / hiho
View on GitHub
Hadoop Data Integration with various databases, ftp servers, salesforce. Incremental update, dedup, append, merge your data on Hadoop.
☆92Apr 11, 2013Updated 13 years ago
LinkedInAttic / camus
View on GitHub
LinkedIn's previous generation Kafka to HDFS pipeline.
☆879Aug 27, 2020Updated 5 years ago
kawaa / Beetest
View on GitHub
A super simple utility for testing Apache Hive scripts locally for non-Java developers.
☆73Feb 11, 2017Updated 9 years ago
tresata / ganitha
View on GitHub
scalding powered machine learning
☆109Nov 18, 2014Updated 11 years ago
julianhyde / optiq
View on GitHub
Obsolete - superseded by Apache Calcite
☆237Jan 20, 2021Updated 5 years ago
OryxProject / oryx
View on GitHub
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
☆1,783Aug 16, 2021Updated 4 years ago
recruit-tech / xchainer
View on GitHub
ニューラルネットワークライブラリchainerの拡張モジュールです
☆17Sep 4, 2015Updated 10 years ago
flink-taiwan / hadoopcon2016-training
View on GitHub
Flink training workshop for HadoopCon 2016 (Annual Hadoop Conference @ Taiwan)
☆11Sep 14, 2016Updated 9 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
facebookarchive / hive-io-experimental
View on GitHub
Hive I/O Library
☆67Oct 28, 2021Updated 4 years ago
jpatanooga / KnittingBoar
View on GitHub
Parallel Iterative Algorithm (SGD) on Hadoop's YARN framework
☆43Jan 30, 2013Updated 13 years ago
cloudml / zen
View on GitHub
Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logi…
☆169Nov 17, 2018Updated 7 years ago
h2oai / h2o-2
View on GitHub
Please visit https://github.com/h2oai/h2o-3 for latest H2O
☆2,254Oct 24, 2024Updated last year
nexr / RHive
View on GitHub
RHive is an R extension facilitating distributed computing via Apache Hive.
☆121Jul 19, 2017Updated 9 years ago
amplab / keystone
View on GitHub
Simplifying robust end-to-end machine learning on Apache Spark.
☆473Apr 18, 2017Updated 9 years ago
guoding83128 / OpenDL
View on GitHub
The Deep Learning training framework on Spark
☆221May 3, 2025Updated last year