analyticalmonk / pyspark_nlp_workshop
Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"
☆13Updated last year
Alternatives and similar repositories for pyspark_nlp_workshop:
Users that are interested in pyspark_nlp_workshop are comparing it to the libraries listed below
- A curated list of dagster code snippets for data engineers☆53Updated 11 months ago
- Cost Efficient Data Pipelines with DuckDB☆48Updated 6 months ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆49Updated last year
- ☆26Updated 2 weeks ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- dagster scikit-learn pipeline example.☆44Updated last year
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆12Updated 8 months ago
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆55Updated 2 years ago
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 5 months ago
- Full stack data engineering tools and infrastructure set-up☆48Updated 3 years ago
- DataForge helps data teams write functional transformation pipelines by leveraging software engineering principles☆47Updated last month
- A curated list of awesome SQLMesh resources☆25Updated 2 months ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- ☆27Updated 6 months ago
- Linear regression in SQL using dbt☆69Updated 2 weeks ago
- A serverless duckDB deployment at GCP☆38Updated 2 years ago
- Data Access Layer☆28Updated 2 years ago
- Quickstart for any service☆138Updated this week
- Analyzing hacker news in real-time with Bytewax and Proton☆38Updated last year
- Repo for CDC with debezium blog post☆28Updated 4 months ago
- Repo for orienting dbt users to the Dagster asset framework☆53Updated 2 years ago
- Palm CLI - the tool-belt for data teams☆47Updated 10 months ago
- Getting started with DuckDB, by Packt Publishing☆48Updated 6 months ago
- Code for data quality with greatexpectations blog☆12Updated 6 months ago
- ☆15Updated 9 months ago
- Quick Guides from Dremio on Several topics☆67Updated 2 weeks ago
- A write-audit-publish implementation on a data lake without the JVM☆45Updated 5 months ago
- dbt-yaml-check checks that columns defined in YAML also exist in SQL.☆31Updated 2 years ago
- ☆85Updated 8 months ago
- New generation opensource data stack☆65Updated 2 years ago