analyticalmonk / pyspark_nlp_workshop
Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"
☆13Updated last year
Alternatives and similar repositories for pyspark_nlp_workshop:
Users that are interested in pyspark_nlp_workshop are comparing it to the libraries listed below
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆58Updated 2 years ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆50Updated last year
- ☆21Updated 7 months ago
- New generation opensource data stack☆65Updated 2 years ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 7 months ago
- A Higher-Level, Composable SQL☆43Updated this week
- ☆80Updated 9 months ago
- Demos of Materialize, the operational data warehouse.☆51Updated last month
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 4 years ago
- Delta Lake helper methods. No Spark dependency.☆23Updated 7 months ago
- Cost Efficient Data Pipelines with DuckDB☆51Updated 8 months ago
- 🚀 Stream inferences of real-time ML models in production to any data lake (Experimental)☆80Updated 2 years ago
- Query Iceberg in Trino, Nessie as Catalog, and use minio to replace AWS S3☆18Updated 10 months ago
- A serverless duckDB deployment at GCP☆39Updated 2 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated last year
- ☆20Updated 3 years ago
- dagster scikit-learn pipeline example.☆44Updated 2 years ago
- Sord Data Fabric: A Vue 3 frontend with a Python WebSocket server, leveraging a distributed architecture with DeltaLake and DuckDB worker…☆18Updated last year
- Cloud services and hosting for Python web apps☆30Updated 2 years ago
- Repo for CDC with debezium blog post☆28Updated 6 months ago
- Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.☆51Updated last month
- Example projects built on MotherDuck☆25Updated last month
- A Snowflake GPT Demo using SqlAlchemy☆23Updated last year
- Ibis analytics, with Ibis (and more!)☆21Updated 6 months ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated 10 months ago
- A high-performance data streaming system using DuckDB and Apache Arrow Flight.☆76Updated last month
- Linear regression in SQL using dbt☆69Updated 2 months ago
- Quickstart for any service☆141Updated this week
- Build a REST API on top of your data warehouse☆42Updated 2 years ago
- lakefs-samples repository☆79Updated last week