shubh1205 / pyspark-kafka-boilerplateLinks
This is a boilerplate which has dependencies for pyspark(3.3.0) mongo(>4.x) connectivity
☆10Updated last year
Alternatives and similar repositories for pyspark-kafka-boilerplate
Users that are interested in pyspark-kafka-boilerplate are comparing it to the libraries listed below
Sorting:
- A simple CLI command that initialises a Kedro project from an existing Python package☆11Updated last year
- Simple implementation of scientific paper 'GAC: Graph-Based Alert Correlation for the Detection of Distributed Multi-Step Attacks'☆19Updated 6 years ago
- Anomaly Detection Pipeline with Isolation Forest model and Kedro framework☆24Updated 2 years ago
- Base Kafka Producer, consumer, flask api and PySpark Structured streaming Job☆11Updated 3 years ago
- A tutorial on how to use kedro-mlflow plugin (https://github.com/Galileo-Galilei/kedro-mlflow) to synchronize training and inference and …☆40Updated 3 years ago
- ☆16Updated 8 months ago
- AutoML 2024: HPOD: Hyperparameter Optimization for Unsupervised Outlier Detection☆12Updated last year
- Analyzing multiple multivariate time series datasets and using LSTMs and Nonparametric Dynamic Thresholding to detect anomalies across va…☆21Updated 3 years ago
- Official repository of the paper "Interpretable Anomaly Detection with DIFFI: Depth-based Isolation Forest Feature Importance", M. Carlet…☆29Updated last year
- Evaluate real and synthetic datasets against each other☆92Updated last month
- Evaluation Tool for Anomaly Detection Algorithms on Time Series☆137Updated this week
- Explaining Anomalies Detected by Autoencoders Using SHAP☆43Updated 4 years ago
- Predict if a reservation will be canceled using robust Machine Learning pipelines with Airflow and Mlflow☆64Updated last year
- Jithsaavvy / Explaining-deep-learning-models-for-detecting-anomalies-in-time-series-data-RnD-projectThis research work focuses on comparing the existing approaches to explain the decisions of models trained using time-series data and pro…☆29Updated 3 years ago
- Material for the PySpark course☆14Updated 4 months ago
- ☆18Updated 11 months ago
- Workshop "From zero to MLOps: An open source stack to fight spaghetti ML"☆25Updated last year
- nbsynthetic is simple and robust tabular synthetic data generation library for small and medium size datasets☆68Updated 2 years ago
- Adding feature_importances_ property to sklearn.cluster.KMeans class☆64Updated 2 years ago
- Find data quality issues and clean your data in a single line of code with a Scikit-Learn compatible Transformer.☆132Updated last year
- Feature engineering package with sklearn like functionality☆56Updated last year
- Interpretation of Isolation Forests☆21Updated last year
- ☆42Updated 3 months ago
- PyData London 2024 Prefect Workshop☆16Updated last year
- Repository for Content-Aware Transformer☆14Updated 2 years ago
- The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.☆52Updated 2 years ago
- Various utilities for time series forecasting☆10Updated 2 years ago
- This repository about how to deploy machine learning model end serving with FastAPI and using MLFlow-MINIO☆19Updated 2 years ago
- ☆12Updated 3 years ago
- sktime - python toolbox for time series: pipelines and transformers☆24Updated 2 years ago