This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
☆134May 5, 2022Updated 4 years ago
Alternatives and similar repositories for spark-infotheoretic-feature-selection
Users that are interested in spark-infotheoretic-feature-selection are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)☆43Jan 12, 2023Updated 3 years ago
- Featureselection methods as Spark MLlib Pipelines☆30Apr 29, 2018Updated 8 years ago
- Machine learning enhancements to Spark MlLib☆20Mar 19, 2015Updated 11 years ago
- Practice and Workshop on BigData and Cloud Computing using Docker Containers and OpenNebula. HDFS, hadoop and spark+R☆11Mar 16, 2017Updated 9 years ago
- Generic implementation of Information Theory-based Feature Selection methods. It also contains an Entropy Minimization Discretization imp…☆19Jul 21, 2014Updated 11 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark☆145Jan 26, 2016Updated 10 years ago
- ☆25Mar 12, 2018Updated 8 years ago
- Distributed t-SNE via Apache Spark☆159Dec 9, 2017Updated 8 years ago
- Sparse feature extraction with Spark☆30Jul 25, 2018Updated 7 years ago
- Implementation of unsupervised feature selection algorithm proposed by [Huang, et al. 2015]☆10Dec 25, 2015Updated 10 years ago
- Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logi…☆169Nov 17, 2018Updated 7 years ago
- My MSc on Data Science final project. This is a library for Data Pre-processing Algorithms for Streaming in Flink (DPASF)☆18Jul 1, 2019Updated 7 years ago
- This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bo…☆10Oct 29, 2018Updated 7 years ago
- MLeap demo repository for use with MLeap blog posts☆11Jul 13, 2016Updated 9 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Old repo for R interface for GraphFrames☆13Mar 21, 2018Updated 8 years ago
- dllib is a distributed deep learning library running on Apache Spark☆32Oct 26, 2017Updated 8 years ago
- Spark MLlib code optimized to efficiently support sparse data☆51Dec 22, 2016Updated 9 years ago
- PMML scoring library for Spark as SparkML Transformer☆21Oct 20, 2024Updated last year
- Example code for building your own MemSQL Streamliner Pipelines☆23Apr 18, 2017Updated 9 years ago
- MLeap: Deploy ML Pipelines to Production☆1,539Mar 10, 2026Updated 3 months ago
- Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine☆169Feb 26, 2021Updated 5 years ago
- A scalable machine learning library on Apache Spark☆797Aug 30, 2021Updated 4 years ago
- Final career project on PAC theory and imbalanced datasets☆18Sep 6, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- k-Nearest Neighbors algorithm on Spark☆242Nov 14, 2023Updated 2 years ago
- Recursos de Haskell☆18Mar 29, 2019Updated 7 years ago
- Experiments with scala native & libpcap☆10Mar 30, 2018Updated 8 years ago
- A package for dealing with crowdsourced big data. Website: https://enriquegrodrigo.github.io/spark-crowd/☆64Dec 5, 2018Updated 7 years ago
- 主要解决ctr预估工程中的特征选择,特征编号(特征离散),单特征auc和logloss这3个问题.☆20Mar 30, 2017Updated 9 years ago
- SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.☆154Jul 31, 2020Updated 5 years ago
- Affinity Propagation on Spark☆20May 31, 2021Updated 5 years ago
- Using JPMML Evaluator to validate the PMML models exported from Spark☆19May 1, 2017Updated 9 years ago
- Deeplearning framework running on Spark☆62Dec 16, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Online Machine Learning Algorithms☆30Jun 14, 2023Updated 3 years ago
- A implementation of the Self-Tuning Spectral Clustering algorithm, and more.☆12Sep 4, 2016Updated 9 years ago
- Different entries to kaggle contests using Apache Spark☆13Jun 5, 2017Updated 9 years ago
- PyPI package to calculate comprehensive confidence intervals for classification positive rate, precision, NPV, and recall using a labeled…☆10Jul 6, 2023Updated 2 years ago
- Model-based clustering package for mixed data☆13May 21, 2026Updated last month
- ☆11Dec 10, 2015Updated 10 years ago
- Trivial Spark app that counts Titan vertices☆10Mar 4, 2015Updated 11 years ago