morsapaes / pyflink-nlpLinks
Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! 🐳
☆21Updated 4 years ago
Alternatives and similar repositories for pyflink-nlp
Users that are interested in pyflink-nlp are comparing it to the libraries listed below
Sorting:
- Repo for all my code on the articles I post on medium☆107Updated 3 years ago
- One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)☆120Updated 4 years ago
- PySpark data-pipeline testing and CICD☆28Updated 5 years ago
- Generate and Visualize Data Lineage from query history☆327Updated 2 years ago
- Delta Lake examples☆235Updated last year
- A simplified, lightweight ETL Framework based on Apache Spark☆586Updated last year
- A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.☆346Updated last year
- Airflow training for the crunch conf☆104Updated 7 years ago
- Spark style guide☆271Updated last year
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆103Updated 2 years ago
- Grafana dashboards and StatsD exporter config for Airflow monitoring☆289Updated last year
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆226Updated 2 years ago
- Repo that relates to the Medium blog 'Keeping your ML model in shape with Kafka, Airflow' and MLFlow'☆121Updated 2 years ago
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆197Updated 2 weeks ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆77Updated 4 years ago
- Tool to automate data quality checks on data pipelines☆256Updated 3 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆78Updated 2 years ago
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆67Updated 3 years ago
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.☆267Updated 9 months ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆125Updated last month
- Data ingestion library for Amundsen to build graph and search index☆204Updated last year
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 3 years ago
- A Helm chart to install Apache Airflow on Kubernetes☆290Updated last week
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆168Updated 2 years ago
- ThirdEye is an integrated tool for realtime monitoring of time series and interactive root-cause analysis.☆108Updated 7 months ago
- Making Machine Learning Simple and Scalable with Python, Jupyter Notebook, TensorFlow, Keras, Apache Kafka and KSQL☆97Updated 6 years ago
- spark on kubernetes☆104Updated 2 years ago
- Spark on Kubernetes using Helm☆33Updated 5 years ago
- Delta Lake Documentation☆51Updated last year
- A simple Spark-powered ETL framework that just works 🍺☆181Updated 2 months ago