analyticalmonk / pyspark_nlp_workshop
Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"
☆12Updated last year
Related projects ⓘ
Alternatives and complementary repositories for pyspark_nlp_workshop
- Cost Efficient Data Pipelines with DuckDB☆46Updated 3 months ago
- New generation opensource data stack☆61Updated 2 years ago
- Code for data quality with greatexpectations blog☆12Updated 3 months ago
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆55Updated last year
- ☆15Updated 6 months ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆48Updated last year
- Full stack data engineering tools and infrastructure set-up☆44Updated 3 years ago
- ☆24Updated last month
- Sord Data Fabric: A Vue 3 frontend with a Python WebSocket server, leveraging a distributed architecture with DeltaLake and DuckDB worker…☆18Updated 11 months ago
- Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.☆39Updated 3 weeks ago
- Open Data Stack Projects: Examples of End to End Data Engineering Projects☆71Updated last year
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- rust-for-data☆43Updated last year
- ☆26Updated last year
- Linear regression in SQL using dbt☆66Updated last month
- Analyzing hacker news in real-time with Bytewax and Proton☆38Updated 9 months ago
- Read Delta tables without any Spark☆47Updated 8 months ago
- A write-audit-publish implementation on a data lake without the JVM☆41Updated 3 months ago
- Ibis analytics, with Ibis (and more!)☆19Updated last month
- csv and flat-file sniffer built in Rust.☆42Updated 9 months ago
- Code snippets for Data Engineering Design Patterns book☆40Updated last week
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆26Updated 8 months ago
- ☆20Updated 3 years ago
- DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles☆43Updated this week
- ☆24Updated 4 months ago
- lakefs-samples repository☆71Updated 2 weeks ago
- Build your feature store with macros right within your dbt repository☆37Updated last year
- This repo contains information about DuckDB extensions found on GitHub. Refreshed daily☆82Updated this week
- ☆21Updated 3 months ago