ververica / pyflink-nlpLinks
Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! π³
β10Updated 4 years ago
Alternatives and similar repositories for pyflink-nlp
Users that are interested in pyflink-nlp are comparing it to the libraries listed below
Sorting:
- Friendly ML feature storeβ46Updated 3 years ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resourcesβ17Updated 4 years ago
- Demonstration of a Hive Input Format for Icebergβ26Updated 4 years ago
- phData Pulse application log aggregation and monitoringβ13Updated 5 years ago
- An implementation of the DatasourceV2 interface of Apache Sparkβ’ for writing Spark Datasets to Apache Druidβ’.β43Updated 3 weeks ago
- The Internals of PySparkβ26Updated 6 months ago
- Flink stream filtering examplesβ19Updated 9 years ago
- Code that was used as an example during the Data+AI Summit 2020β15Updated 4 years ago
- This repository contains recipes for Apache Pinot.β30Updated 4 months ago
- This repository contains a recipe for bootstrapping a climate analysis application using Apache Pinot and Supersetβ20Updated 4 years ago
- Scalable CDC Pattern Implemented using PySparkβ18Updated 6 years ago
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs β¦β158Updated 2 years ago
- Provide functionality to build statistical models to repair dirty tabular data in Sparkβ12Updated 2 years ago
- β63Updated 5 years ago
- Spark package to "plug" holes in data using SQL based rules β‘οΈ πβ29Updated 5 years ago
- The sane way of building a data layer in Airflowβ24Updated 5 years ago
- Data validation library for PySpark 3.0.0β33Updated 2 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.β29Updated last week
- β58Updated 11 months ago
- Observability Python library - Powered by Kensuβ21Updated 9 months ago
- Helpers & syntactic sugar for PySpark.β62Updated 2 years ago
- A Spark datasource for the HadoopCryptoLedger libraryβ13Updated 2 years ago
- β22Updated 6 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data piβ¦β96Updated last week
- Schema Registry integration for Apache Sparkβ40Updated 2 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multipleβ¦β26Updated 4 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applicationsβ36Updated 7 months ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ toβ¦β29Updated 7 months ago
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines anβ¦β61Updated 10 months ago
- Data Catalog for Databases and Data Warehousesβ35Updated last year