This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
☆135May 5, 2022Updated 3 years ago
Alternatives and similar repositories for spark-infotheoretic-feature-selection
Users that are interested in spark-infotheoretic-feature-selection are comparing it to the libraries listed below
Sorting:
- Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)☆43Jan 12, 2023Updated 3 years ago
- Featureselection methods as Spark MLlib Pipelines☆31Apr 29, 2018Updated 7 years ago
- Machine learning enhancements to Spark MlLib☆20Mar 19, 2015Updated 10 years ago
- Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark☆146Jan 26, 2016Updated 10 years ago
- Distributed t-SNE via Apache Spark☆160Dec 9, 2017Updated 8 years ago
- Feature engineering toolkit for Spark MLlib.☆12Apr 1, 2017Updated 8 years ago
- Spark MLlib code optimized to efficiently support sparse data☆51Dec 22, 2016Updated 9 years ago
- MLeap demo repository for use with MLeap blog posts☆11Jul 13, 2016Updated 9 years ago
- Spark-based GBM☆56Feb 19, 2020Updated 6 years ago
- Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logi…☆170Nov 17, 2018Updated 7 years ago
- Sparse feature extraction with Spark☆30Jul 25, 2018Updated 7 years ago
- MLeap: Deploy ML Pipelines to Production☆1,535Jan 12, 2026Updated last month
- Affinity Propagation on Spark☆20May 31, 2021Updated 4 years ago
- A scalable machine learning library on Apache Spark☆796Aug 30, 2021Updated 4 years ago
- Deeplearning framework running on Spark☆61Dec 16, 2023Updated 2 years ago
- Using JPMML Evaluator to validate the PMML models exported from Spark☆19May 1, 2017Updated 8 years ago
- Trivial Spark app that counts Titan vertices☆10Mar 4, 2015Updated 11 years ago
- An implement of Factorization Machines (LibFM)☆250Aug 13, 2018Updated 7 years ago
- Example code for building your own MemSQL Streamliner Pipelines☆23Apr 18, 2017Updated 8 years ago
- PMML scoring library for Spark as SparkML Transformer☆22Oct 20, 2024Updated last year
- Vector-free L-BFGS implementation for Spark MLlib☆46Jun 23, 2017Updated 8 years ago
- The machine learning component of Open Network Insight: scalable analytics combining spark for big data and C / MPI for high performance …☆13Nov 9, 2016Updated 9 years ago
- Feature selection problem is one of the most significant issues in data classification. The purpose of feature selection is selection of …☆10Jan 29, 2020Updated 6 years ago
- This package contains the code for executing clustering validity indices in Spark. The package includes BD-Silhouette, BD-Dunn, Davies-Bo…☆10Oct 29, 2018Updated 7 years ago
- Machine Intelligence Toolkits- based on Parameter Server that Efficient Distributed Communication Framework and Alternating Direction Mu…☆11May 6, 2018Updated 7 years ago
- k-Nearest Neighbors algorithm on Spark☆240Nov 14, 2023Updated 2 years ago
- Some popular algorithms(dbscan,knn,fm etc.) on spark☆32May 29, 2018Updated 7 years ago
- Sparkling Water provides H2O functionality inside Spark cluster☆977Nov 5, 2025Updated 3 months ago
- SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.☆153Jul 31, 2020Updated 5 years ago
- Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine☆168Feb 26, 2021Updated 5 years ago
- dllib is a distributed deep learning library running on Apache Spark☆32Oct 26, 2017Updated 8 years ago
- ☆12Sep 26, 2019Updated 6 years ago
- Mirror of Apache Samoa (Incubating)☆251Apr 16, 2023Updated 2 years ago
- Java library and command-line application for converting Apache Spark ML pipelines to PMML☆270Feb 6, 2026Updated 3 weeks ago
- A library for time series analysis on Apache Spark☆1,196Oct 13, 2020Updated 5 years ago
- An Apache Spark app for making data movement between Apache Hive and Apache Phoenix/HBase☆14Mar 23, 2016Updated 9 years ago
- Feature Selection by Optimized LASSO algorithm☆17May 3, 2017Updated 8 years ago
- Another, hopefully better, implementation of ALS on Spark☆14May 20, 2015Updated 10 years ago
- Project, source code and data related to the 2nd edition of Scala for machine learning -2017☆15Jan 14, 2018Updated 8 years ago