darenasc / aeda
Build a data catalog by running a single line of code
☆16Updated 4 months ago
Alternatives and similar repositories for aeda:
Users that are interested in aeda are comparing it to the libraries listed below
- Using the Parquet file format with Python☆15Updated last year
- Pandas helper functions☆30Updated last year
- ☆12Updated 2 months ago
- How to do data science with Optimus, Spark and Python.☆19Updated 5 years ago
- ☕⛵WIP PySpark dependency management☆22Updated 6 years ago
- Record matching and entity resolution at scale in Spark☆32Updated last year
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆26Updated 10 months ago
- Automated Exploratory Data Analysis. Simplifying Data Exploration☆34Updated 4 years ago
- dagster scikit-learn pipeline example.☆44Updated last year
- Cost Efficient Data Pipelines with DuckDB☆48Updated 5 months ago
- Build your feature store with macros right within your dbt repository☆38Updated 2 years ago
- Generating Realistic Synthetic Data☆32Updated 11 months ago
- Repo demonstrating a Dagster pipeline to generate Neo4j Graph☆21Updated 3 years ago
- Ibis analytics, with Ibis (and more!)☆20Updated 4 months ago
- A serverless duckDB deployment at GCP☆38Updated 2 years ago
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withou…☆113Updated 10 months ago
- Blog post on ETL pipelines with Airflow☆23Updated 4 years ago
- A maximum-strength name parser for record linkage.☆36Updated 5 months ago
- Analytics on Apache Projects for Diversity☆18Updated 5 years ago
- A scikit-learn compatible estimator based on business-rules with interactive dashboard included☆28Updated 3 years ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆12Updated 8 months ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to…☆26Updated last month
- The sane way of building a data layer in Airflow☆24Updated 5 years ago
- Fake Pandas / PySpark DataFrame creator☆44Updated 10 months ago
- A utility for labeling clusters of text data.☆28Updated 3 years ago
- A monorepo of many Rill example projects☆33Updated 2 weeks ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆49Updated last year
- Supporting materials/code examples for my course in data engineering for machine learning.☆38Updated 2 years ago
- ☆21Updated 5 months ago
- Evaluation Matrix for Change Data Capture☆24Updated 5 months ago