CSE601-DataMining / Clustering
Implement three clustering algorithms to find clusters of genes that exhibit similar expression profiles: K-means, Hierarchical Agglomerative clustering with Single Link (Min), and one from (density-based, mixture model, spectral). Set up a single-node Hadoop cluster on your machine and implement MapReduce K-means. Compare with non-parallel…
☆12Updated 10 years ago
Alternatives and similar repositories for Clustering:
Users that are interested in Clustering are comparing it to the libraries listed below
- ☆11Updated 7 years ago
- 自助搭建的 hadoop + spark + kafka + zookeeper + storm + hbase + hive + flume 集群,一主两从。☆30Updated 6 years ago
- 数据清洗系统;hadoop;实体识别;冲突消解;不一致修复;缺失值填充☆17Updated 8 years ago
- Several implementation for building hbase secondary index.☆39Updated 9 years ago
- ☆21Updated 8 years ago
- 这是Word2vec和Doc2vec的一个应用示例:用Word2vec计算词的相似度和用doc2vec计算句子的相似度。☆26Updated 7 years ago
- Spark中实现用户画像系统价值度、忠诚度、流失预警、活跃度等模型☆66Updated 7 years ago
- 大数据【企业级360°全方位用户画像】标签开发部分源码☆19Updated 4 years ago
- 运用DBScan算法对学生按上网时间进行聚类的一个应用☆20Updated 8 years ago
- kafka传数据到Flink存储到mysql之Flink使用SQL语句聚合数据流(设置时间窗口,EventTime)☆32Updated 6 years ago
- 总结了一些Spark学习过程中的例子(附代码详细注释)☆23Updated 6 years ago
- 基于Hadoop和HBase的大规模海量数据去重☆29Updated 6 years ago
- 天亮分词器第12个小版本☆8Updated 11 years ago
- winutils and hadoop lib for spark on windows_X64☆36Updated 8 years ago
- 使用Spark的MLlib、Hbase作为模型、Hive作数据清洗的核心推荐引擎,在Spark on Yarn测试通过☆29Updated 8 years ago
- ☆15Updated 5 years ago
- dw etl 工具 mysql 增量、全量抽取 to hive. 合并 hive 数据表, 等数据平台清洗工具☆9Updated 8 years ago
- Showcase for our blog entry about Spring Data Neo4j.☆31Updated 11 years ago
- 各种安全相关思维导图整理收集☆11Updated 9 years ago
- Spark PMML 模型离线部署☆12Updated 2 years ago
- 以知乎日报为数据源,全流程实践一个机器学习过程,从数据获取到数据分析,对知乎日报进行聚类、分类,并可视化这一过程☆17Updated 9 years ago
- 基于Spark和Kubernetes的机器学习平台☆30Updated 7 years ago
- 推荐算法☆30Updated 9 years ago
- Spark Streaming + kafka + hbase☆15Updated 6 years ago
- elasticsearch similarity Custom plug-in☆10Updated 11 years ago
- 文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展☆23Updated 11 years ago
- High Performance Spark Streaming with Direct Kafka in Java☆39Updated 8 years ago
- Spark 编程指南简体中文版☆33Updated 8 years ago
- datax写redis插件,支持异构数据源导入string、list、hash类型导redis☆18Updated 3 years ago
- UDF, GenericUDF, UDTF, UDAF☆12Updated 2 years ago