ververica / pyflink-nlp
Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! 🐳
☆10Updated 4 years ago
Alternatives and similar repositories for pyflink-nlp
Users that are interested in pyflink-nlp are comparing it to the libraries listed below
Sorting:
- Self-contained demo using PyFlink with Gensim+spaCy to find topics in the Flink User Mailing List. All you need is Docker! 🐳☆21Updated 3 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated 2 weeks ago
- A bridge to Apache Atlas for provenance metadata created in course of using Apache NiFi☆15Updated 2 years ago
- phData Pulse application log aggregation and monitoring☆13Updated 5 years ago
- Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…☆26Updated 3 years ago
- A curated list of awesome PrestoDB / Trino software, libraries, tools and resources☆17Updated 3 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Friendly ML feature store☆45Updated 3 years ago
- Projects developed by Domino's R&D team☆76Updated 3 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year
- The sane way of building a data layer in Airflow☆24Updated 5 years ago
- ☆21Updated 2 years ago
- An implementation of the DatasourceV2 interface of Apache Spark™ for writing Spark Datasets to Apache Druid™.☆41Updated 3 weeks ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 4 years ago
- ☆63Updated 5 years ago
- This repository contains recipes for Apache Pinot.☆30Updated 2 months ago
- Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Stor…☆41Updated 2 years ago
- ☆54Updated 9 months ago
- A library on top of either pex or conda-pack to make your Python code easily available on a cluster☆45Updated 5 months ago
- Helpers & syntactic sugar for PySpark.☆62Updated last year
- Tools for faster and optimized interaction with Teradata and large datasets.☆17Updated 6 years ago
- spark-drools tutorials☆16Updated last year
- HDFS Automatic Snapshot Service for Linux☆12Updated 8 years ago
- Read Delta tables without any Spark☆47Updated last year
- Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines an…☆61Updated 8 months ago
- ☆14Updated 3 months ago
- Takes a kafka stream into spark, apply transformations and sink into Druid. Everything Dockerised.☆30Updated last year
- Apache StreamPipes - A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data strea…☆26Updated 2 years ago
- Python Streaming Pipelines with Beam on Flink - Demo☆14Updated 2 years ago