mshearer0 / HandsOnEntityResolutionLinks
This repository accompanies Hands On Entity Resolution by O'Reilly
☆24Updated last year
Alternatives and similar repositories for HandsOnEntityResolution
Users that are interested in HandsOnEntityResolution are comparing it to the libraries listed below
Sorting:
- Intro to Polars Tutorial☆23Updated 2 years ago
- Code and materials for Effective Polars book☆83Updated last year
- Scripts and datasets for the O'Reilly book Python Polars: The Definitive Guide☆241Updated 2 months ago
- Course materials for our "Getting Started with NLP and spaCy" course at Talk Python☆38Updated 4 months ago
- This repo is for LinkedIn Learning course: Data Pipeline Automation with GitHub Actions☆50Updated this week
- ☆21Updated 11 months ago
- Data Analysis with Polars, Published by Packt☆32Updated 9 months ago
- A FastMCP tool to search and retrieve Polars API documentation.☆64Updated last month
- Good Practice Tables - an XlsxWriter wrapper to write consistently formatted statistical tables to Excel.☆39Updated 2 weeks ago
- Cost Efficient Data Pipelines with DuckDB☆55Updated 2 months ago
- Files for my "Pandas Workout" book☆83Updated last year
- (WIP) Getting started with Docker - An introduction to Docker with data science and engineering applications☆129Updated last year
- Interactive notebooks containing demonstration code of the splink library☆38Updated last year
- Duke MIDS: Data Engineering and DataOps Course☆67Updated 6 months ago
- Datasets for ML, Analysis, etc☆62Updated 2 months ago
- A Python Environment Template for VScode with UV☆69Updated 3 weeks ago
- Graph Data Modeling in Python, by Packt Publishing☆42Updated 4 months ago
- Full stack data engineering tools and infrastructure set-up☆53Updated 4 years ago
- ☆8Updated last year
- Code for my "Efficient Data Processing in SQL" book.☆57Updated 11 months ago
- sktime - python toolbox for time series: pipelines and transformers☆24Updated 2 years ago
- A tutorial for setting an SQL code generator with the OpenAI API☆247Updated last year
- Blocking records for record linkage and data deduplication based on ANN algorithms in Python.☆13Updated last week
- It's all in the name☆78Updated 2 years ago
- Apache Airflow Best Practices, published by Packt☆43Updated 8 months ago
- A Beginner's Guide to DuckDB's Python Client☆42Updated 9 months ago
- Code repository for the "PySpark in Action" book☆204Updated last month
- Materials for the Deploy and Monitor ML Pipelines with Python, Docker and GitHub Actions workshop at the PyData NYC 2024 conference☆82Updated this week
- Essential PySpark for Scalable Data Analytics, published by Packt☆45Updated 2 years ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆12Updated last year