ververica / pyflink-nlpLinks

Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! 🐳

☆10

Alternatives and similar repositories for pyflink-nlp

Users that are interested in pyflink-nlp are comparing it to the libraries listed below

Sorting:

findify / featury
Friendly ML feature store
☆46Updated 3 years ago
youngwookim / awesome-presto
A curated list of awesome PrestoDB / Trino software, libraries, tools and resources
☆17Updated 4 years ago
ExpediaGroup / hiveberg
Demonstration of a Hive Input Format for Iceberg
☆26Updated 4 years ago
phdata / pulse
phData Pulse application log aggregation and monitoring
☆13Updated 5 years ago
rovio / rovio-ingest
An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.
☆43Updated 3 weeks ago
japila-books / pyspark-internals
The Internals of PySpark
☆26Updated 6 months ago
jgrier / FilteringExample
Flink stream filtering examples
☆19Updated 9 years ago
godatadriven / dbt-data-ai-summit
Code that was used as an example during the Data+AI Summit 2020
☆15Updated 4 years ago
startreedata / pinot-recipes
This repository contains recipes for Apache Pinot.
☆30Updated 4 months ago
kbastani / climate-change-analysis
This repository contains a recipe for bootstrapping a climate analysis application using Apache Pinot and Superset
☆20Updated 4 years ago
avensolutions / cdc-at-scale-using-spark
Scalable CDC Pattern Implemented using PySpark
☆18Updated 6 years ago
intuit / superglue
Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs …
☆158Updated 2 years ago
maropu / spark-data-repair-plugin
Provide functionality to build statistical models to repair dirty tabular data in Spark
☆12Updated 2 years ago
airbnb / sputnik
☆63Updated 5 years ago
indix / sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
☆29Updated 5 years ago
joerg-schneider / airtunnel
The sane way of building a data layer in Airflow
☆24Updated 5 years ago
mikulskibartosz / check-engine
Data validation library for PySpark 3.0.0
☆33Updated 2 years ago
projectnessie / nessie-demos
Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.
☆29Updated last week
polyzos / stream-processing-with-apache-flink
☆58Updated 11 months ago
kensuio-oss / kensu-py
Observability Python library - Powered by Kensu
☆21Updated 9 months ago
tubular / sparkly
Helpers & syntactic sugar for PySpark.
☆62Updated 2 years ago
ZuInnoTe / spark-hadoopcryptoledger-ds
A Spark datasource for the HadoopCryptoLedger library
☆13Updated 2 years ago
anemos-io / protobeam
☆22Updated 6 years ago
dimajix / flowman
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…
☆96Updated last week
hortonworks-spark / spark-schema-registry
Schema Registry integration for Apache Spark
☆40Updated 2 years ago
sparsecode / DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…
☆26Updated 4 years ago
tupol / spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
☆36Updated 7 months ago
mfcabrera / hooqu
hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…
☆29Updated 7 months ago
datamindedbe / lighthouse
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…
☆61Updated 10 months ago
tokern / dbcat
Data Catalog for Databases and Data Warehouses
☆35Updated last year