sramirez/spark-infotheoretic-feature-selection

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sramirez/spark-infotheoretic-feature-selection)

sramirez / spark-infotheoretic-feature-selection

This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.

☆134

Alternatives and similar repositories for spark-infotheoretic-feature-selection

Users that are interested in spark-infotheoretic-feature-selection are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sramirez / spark-MDLP-discretization
View on GitHub
Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)
☆43Jan 12, 2023Updated 3 years ago
MarcKaminski / spark-FeatureSelection
View on GitHub
Featureselection methods as Spark MLlib Pipelines
☆30Apr 29, 2018Updated 8 years ago
wxhC3SC6OPm8M1HXboMy / spark-mrmr-feature-selection
View on GitHub
Machine learning enhancements to Spark MlLib
☆20Mar 19, 2015Updated 11 years ago
LIDIAgroup / SparkFeatureSelection
View on GitHub
Generic implementation of Information Theory-based Feature Selection methods. It also contains an Entropy Minimization Discretization imp…
☆19Jul 21, 2014Updated 12 years ago
collectivemedia / spark-ext
View on GitHub
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
☆145Jan 26, 2016Updated 10 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
JMailloH / kNN_IS
View on GitHub
☆25Mar 12, 2018Updated 8 years ago
saurfang / spark-tsne
View on GitHub
Distributed t-SNE via Apache Spark
☆158Dec 9, 2017Updated 8 years ago
crackcell / mlfeature
View on GitHub
Feature engineering toolkit for Spark MLlib.
☆12Apr 1, 2017Updated 9 years ago
collectivemedia / modelmatrix
View on GitHub
Sparse feature extraction with Spark
☆30Jul 25, 2018Updated 7 years ago
zhengruifeng / SparkGBM
View on GitHub
Spark-based GBM
☆56Feb 19, 2020Updated 6 years ago
takuti / stream-feature-selection
View on GitHub
Implementation of unsupervised feature selection algorithm proposed by [Huang, et al. 2015]
☆10Dec 25, 2015Updated 10 years ago
elbaulp / DPASF
View on GitHub
My MSc on Data Science final project. This is a library for Data Pre-processing Algorithms for Streaming in Flink (DPASF)
☆18Jul 1, 2019Updated 7 years ago
cloudml / zen
View on GitHub
Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logi…
☆169Nov 17, 2018Updated 7 years ago
TrueCar / mleap-demo
View on GitHub
MLeap demo repository for use with MLeap blog posts
☆11Jul 13, 2016Updated 10 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
kevinykuo / sparklygraphs
View on GitHub
Old repo for R interface for GraphFrames
☆13Mar 21, 2018Updated 8 years ago
modal-inria / MixtComp
View on GitHub
Model-based clustering package for mixed data
☆13May 21, 2026Updated 2 months ago
memsql / streamliner-examples
View on GitHub
Example code for building your own MemSQL Streamliner Pipelines
☆23Apr 18, 2017Updated 9 years ago
Lewuathe / dllib
View on GitHub
dllib is a distributed deep learning library running on Apache Spark
☆32Oct 26, 2017Updated 8 years ago
JJ / 1line-py
View on GitHub
Enseñando pensamiento computacional a partir de python one-liners
☆42Jun 18, 2022Updated 4 years ago
intel-spark / SparseML
View on GitHub
Spark MLlib code optimized to efficiently support sparse data
☆51Dec 22, 2016Updated 9 years ago
autodeployai / pmml4s-spark
View on GitHub
PMML scoring library for Spark as SparkML Transformer
☆21Oct 20, 2024Updated last year
combust / mleap
View on GitHub
MLeap: Deploy ML Pipelines to Production
☆1,539Updated this week
enriquegrodrigo / spark-crowd
View on GitHub
A package for dealing with crowdsourced big data. Website: https://enriquegrodrigo.github.io/spark-crowd/
☆64Dec 5, 2018Updated 7 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ddf-project / DDF
View on GitHub
Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine
☆169Feb 26, 2021Updated 5 years ago
linkedin / photon-ml
View on GitHub
A scalable machine learning library on Apache Spark
☆797Aug 30, 2021Updated 4 years ago
viirya / SparkAffinityPropagation
View on GitHub
Affinity Propagation on Spark
☆20May 31, 2021Updated 5 years ago
saurfang / spark-knn
View on GitHub
k-Nearest Neighbors algorithm on Spark
☆241Nov 14, 2023Updated 2 years ago
ScalaWilliam / scala-native-libpcap
View on GitHub
Experiments with scala native & libpcap
☆10Mar 30, 2018Updated 8 years ago
sparkling-graph / sparkling-graph
View on GitHub
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
☆154Jul 31, 2020Updated 5 years ago
selvinsource / spark-pmml-exporter-validator
View on GitHub
Using JPMML Evaluator to validate the PMML models exported from Spark
☆19May 1, 2017Updated 9 years ago
thomasjungblut / tjungblut-online-ml
View on GitHub
Online Machine Learning Algorithms
☆30Jun 14, 2023Updated 3 years ago
kunguang / SelectFeature
View on GitHub
主要解决ctr预估工程中的特征选择，特征编号(特征离散),单特征auc和logloss这3个问题.
☆20Mar 30, 2017Updated 9 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
deepspark / deepspark
View on GitHub
Deeplearning framework running on Spark
☆63Dec 16, 2023Updated 2 years ago
BenFradet / spark-kaggle
View on GitHub
Different entries to kaggle contests using Apache Spark
☆13Jun 5, 2017Updated 9 years ago
armandgrillet / stsc
View on GitHub
A implementation of the Self-Tuning Spectral Clustering algorithm, and more.
☆12Sep 4, 2016Updated 9 years ago
databricks / spark-corenlp
View on GitHub
Stanford CoreNLP wrapper for Apache Spark
☆419Nov 15, 2018Updated 7 years ago
databricks / spark-deep-learning
View on GitHub
Deep Learning Pipelines for Apache Spark
☆1,989Mar 30, 2023Updated 3 years ago
h2oai / sparkling-water
View on GitHub
Sparkling Water provides H2O functionality inside Spark cluster
☆979Nov 5, 2025Updated 8 months ago
gbraccialli / SparkUtils
View on GitHub
☆11Dec 10, 2015Updated 10 years ago