ververica / pyflink-nlpLinks
Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! π³
β10Updated 4 years ago
Alternatives and similar repositories for pyflink-nlp
Users that are interested in pyflink-nlp are comparing it to the libraries listed below
Sorting:
- This repository contains recipes for Apache Pinot.β30Updated 3 months ago
- Code that was used as an example during the Data+AI Summit 2020β15Updated 4 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multipleβ¦β26Updated 4 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.β29Updated this week
- phData Pulse application log aggregation and monitoringβ13Updated 5 years ago
- Friendly ML feature storeβ45Updated 3 years ago
- A Data Mesh demo repositoryβ13Updated 8 months ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resourcesβ17Updated 3 years ago
- Spark package to "plug" holes in data using SQL based rules β‘οΈ πβ29Updated 5 years ago
- The sane way of building a data layer in Airflowβ24Updated 5 years ago
- The Internals of PySparkβ26Updated 5 months ago
- Data validation library for PySpark 3.0.0β33Updated 2 years ago
- A library on top of either pex or conda-pack to make your Python code easily available on a clusterβ45Updated 6 months ago
- Scalable CDC Pattern Implemented using PySparkβ18Updated 5 years ago
- A K8s-based infrastructure for analyticsβ24Updated 5 years ago
- β31Updated 2 years ago
- PySpark phonetic and string matching algorithmsβ39Updated last year
- An implementation of the DatasourceV2 interface of Apache Sparkβ’ for writing Spark Datasets to Apache Druidβ’.β43Updated last week
- β31Updated 5 years ago
- β21Updated last year
- Data Sketches for Apache Sparkβ22Updated 2 years ago
- Demonstration of a Hive Input Format for Icebergβ26Updated 4 years ago
- Read Delta tables without any Sparkβ47Updated last year
- A library that brings useful functions from various modern database management systems to Apache Sparkβ59Updated last year
- Yet Another (Spark) ETL Frameworkβ21Updated last year
- Spark Application UI extension for JupyterLabβ10Updated 3 years ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines anβ¦β61Updated 9 months ago
- A Spark datasource for the HadoopCryptoLedger libraryβ13Updated 2 years ago
- A light Kafka to HDFS/S3 ETL library based on Apache Sparkβ41Updated 7 years ago
- Examples for High Performance Sparkβ16Updated 7 months ago