morsapaes / pyflink-nlpLinks
Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! π³
β21Updated 3 years ago
Alternatives and similar repositories for pyflink-nlp
Users that are interested in pyflink-nlp are comparing it to the libraries listed below
Sorting:
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL databaseβ77Updated 4 years ago
- β269Updated last year
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)β121Updated 4 years ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.β500Updated 3 weeks ago
- Great Expectations Airflow operatorβ169Updated last week
- Repo for all my code on the articles I post on mediumβ107Updated 3 years ago
- Generate and Visualize Data Lineage from query historyβ326Updated 2 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machinesβ134Updated 3 years ago
- Grafana dashboards and StatsD exporter config for Airflow monitoringβ288Updated last year
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Pythonβ44Updated 2 years ago
- Delta Lake examplesβ233Updated last year
- Repository of helm charts for deploying DataHub on a Kubernetes clusterβ196Updated last week
- Example for article Running Spark 3 with standalone Hive Metastore 3.0β102Updated 2 years ago
- A Helm chart to install Apache Airflow on Kubernetesβ290Updated 3 weeks ago
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.β168Updated 2 years ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architectureβ124Updated 3 weeks ago
- Repo that relates to the Medium blog 'Keeping your ML model in shape with Kafka, Airflow' andΒ MLFlow'β121Updated 2 years ago
- A convenient Python wrapper for Apache NiFiβ270Updated 3 weeks ago
- Apache Hive Metastore as a Standalone server in Dockerβ80Updated last year
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsianβ223Updated 2 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframesβ64Updated 3 years ago
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.β346Updated last year
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)β253Updated 2 months ago
- Docker with Airflow and Spark standalone clusterβ262Updated 2 years ago
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.β267Updated 8 months ago
- Helm charts for Trino and Trino Gatewayβ187Updated last week
- π Tech blogs & talks by companies that run Apache Flink in productionβ184Updated 2 weeks ago
- ThirdEye is an integrated tool for realtime monitoring of time series and interactive root-cause analysis.β106Updated 6 months ago
- Airflow training for the crunch confβ104Updated 7 years ago
- Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.β375Updated 6 months ago