peerside / awesome-data-wrangling
A curated list of data wrangling resources
β35Updated 6 years ago
Alternatives and similar repositories for awesome-data-wrangling:
Users that are interested in awesome-data-wrangling are comparing it to the libraries listed below
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.β57Updated 4 months ago
- This is a compilation of Data Governance resources, examples, models and communitiesβ12Updated 6 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data πβ32Updated 2 years ago
- Centralized whale instance using github actions, sourcing metadata from bigquery-public-data.β17Updated 10 months ago
- A python client library for the Stitch Import APIβ42Updated last year
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/β11Updated 10 months ago
- π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)β141Updated last year
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable froβ¦β27Updated 2 years ago
- Runnable e-commerce mini data warehouse based on Python, PostgreSQL & Metabase, template for new projectsβ29Updated 4 years ago
- Fivetran data models for QuickBooks using dbt.β28Updated 3 months ago
- β38Updated 2 months ago
- Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.β123Updated 3 years ago
- Awesome Business Intelligenceβ28Updated 6 months ago
- A modern ELT demo using airbyte, dbt, snowflake and dagsterβ27Updated 2 years ago
- portable Python ML-powered data botβ23Updated 6 months ago
- A curated list of dagster code snippets for data engineersβ54Updated last year
- dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.β57Updated 3 years ago
- A maximum-strength name parser for record linkage.β36Updated 2 weeks ago
- Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.β79Updated 2 weeks ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in β¦β21Updated 2 years ago
- A monorepo of many Rill example projectsβ35Updated last week
- Open Data Stack Projects: Examples of End to End Data Engineering Projectsβ79Updated last year
- The Taxonomy for ETL Automation Metadata (TEAM) is a tool for design metadata management geared towards data warehouse automation. It is β¦β36Updated 2 months ago
- This repository contains example implementations for KNIME Analytics Platform.β17Updated 3 months ago
- Repo demonstrating a Dagster pipeline to generate Neo4j Graphβ21Updated 3 years ago
- a convenient way to anonymize your data for analyticsβ22Updated 3 years ago
- Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar realβ¦β23Updated last week
- A financial disclosure data extraction tool.β16Updated last year
- Run streamlit web application, test and deploy to a cloud service (GCP, AWS, Heroku)β14Updated 2 years ago
- Data models for Hubspot built using dbt.β35Updated this week