morsapaes / pyflink-nlp
Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! ๐ณ
โ21Updated 3 years ago
Alternatives and similar repositories for pyflink-nlp
Users that are interested in pyflink-nlp are comparing it to the libraries listed below
Sorting:
- Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! ๐ณโ10Updated 4 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0โ98Updated 2 years ago
- โ53Updated last year
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data piโฆโ94Updated last week
- The Workload Analyzer collects Prestoยฎ and Trino workload statistics, and analyzes themโ135Updated last year
- Docker image for Apache Hive Metastoreโ71Updated 2 years ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL databaseโ75Updated 3 years ago
- Spark on Kubernetes using Helmโ34Updated 4 years ago
- Adapter for dbt that executes dbt pipelines on Apache Flinkโ95Updated last year
- โ54Updated 9 months ago
- The Internals of Spark on Kubernetesโ71Updated 3 years ago
- ๐ Tech blogs & talks by companies that run Apache Flink in productionโ172Updated 3 months ago
- Delta Lake helper methods. No Spark dependency.โ23Updated 8 months ago
- Repository of helm charts for deploying DataHub on a Kubernetes clusterโ185Updated last month
- Helm charts for Trino and Trino Gatewayโ165Updated 2 weeks ago
- PySpark phonetic and string matching algorithmsโ39Updated last year
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Pythonโ44Updated 2 years ago
- โ80Updated 3 weeks ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Aโฆโ122Updated this week
- The Internals of PySparkโ26Updated 4 months ago
- A repository containing materials for Stateful Functions workshopโ44Updated last year
- Apache Flink (Pyflink) and Related Projectsโ39Updated last month
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframesโ63Updated 2 years ago
- REST API for Apache Spark on K8S or YARNโ98Updated this week
- A Python package to submit and manage Apache Spark applications on Kubernetes.โ41Updated last month
- Friendly ML feature storeโ45Updated 3 years ago
- Data validation library for PySpark 3.0.0โ33Updated 2 years ago
- Generate and Visualize Data Lineage from query historyโ325Updated last year
- Kubernetes operator for managing the lifecycle of Apache Flink and Beam applications.โ215Updated 2 weeks ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake workโ47Updated 2 years ago