CSE601-DataMining / Clustering
Implement three clustering algorithms to find clusters of genes that exhibit similar expression profiles: K-means, Hierarchical Agglomerative clustering with Single Link (Min), and one from (density-based, mixture model, spectral). Set up a single-node Hadoop cluster on your machine and implement MapReduce K-means. Compare with non-parallel…
☆12Updated 10 years ago
Alternatives and similar repositories for Clustering:
Users that are interested in Clustering are comparing it to the libraries listed below
- ☆21Updated 8 years ago
- ☆11Updated 7 years ago
- 计算两个特征向量的相似度☆26Updated 6 years ago
- 机器学习项目☆37Updated 8 years ago
- 数据清洗系统;hadoop;实体识别;冲突消解;不一致修复;缺失值填充☆17Updated 8 years ago
- ☆15Updated 5 years ago
- 大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件☆39Updated last year
- Several implementation for building hbase secondary index.☆39Updated 9 years ago
- 自助搭建的 hadoop + spark + kafka + zookeeper + storm + hbase + hive + flume 集群,一主两从。☆30Updated 6 years ago
- 常用文本聚类算法java实现☆15Updated 10 years ago
- springmvc+phoenix操作hbase的web架构☆10Updated 6 years ago
- 文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展☆23Updated 11 years ago
- 这是Word2vec和Doc2vec的一个应用示例:用Word2vec计算词的相似度和用doc2vec计算句子的相似度。☆26Updated 7 years ago
- 基于Spark和Kubernetes的机器学习平台☆30Updated 7 years ago
- 通过Flink的restful API完成job 提交 启动 查询 取消操作☆20Updated 2 years ago
- Big Data Tips, such as spark core,streaming,Machine Learning,Deep Learning etc.☆17Updated 6 years ago
- Spark中实现用户画像系统价值度、忠诚度、流失预警、活跃度等模型☆66Updated 7 years ago
- JPMML-SparkML plugin for converting XGBoost4J-Spark models to PMML☆36Updated 5 years ago
- SparkSQL数据分析案例☆23Updated 8 years ago
- ☆42Updated 5 years ago
- DBSCAN clustering algorithm implemented in Apache Spark (MapReduce Framework).☆14Updated 8 years ago
- ☆14Updated 2 years ago
- Spark Streaming + kafka + hbase☆15Updated 6 years ago
- Spark PMML 模型离线部署☆12Updated 2 years ago
- Implementation of text clustering algorithms including K-means, MBSAS, DBSCAN.☆44Updated 7 years ago
- 流程化 机器学习框架 基于 scala java语言 ,一站式自动机器学习平台 ,主要包括数据分析 特征工程 ,机器模型,自动部署,超参数优化,模型自动优化,自动扩容分配创建功能,类似第四范式、阿里PAI平台、google autoMl、亚马逊SageMaker☆65Updated 6 years ago
- Spark 编程指南简体中文版☆33Updated 8 years ago
- A Spark based semantic reasoning engine☆14Updated 8 years ago
- winutils and hadoop lib for spark on windows_X64☆36Updated 8 years ago
- 简单易用的ETL工具☆17Updated 6 years ago