apicrafter / metacrafter
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
β43Updated 6 months ago
Alternatives and similar repositories for metacrafter:
Users that are interested in metacrafter are comparing it to the libraries listed below
- Provide an easy way with Python to protect your data sources by searching its metadata. π‘οΈβ16Updated this week
- Registry of metadata identifier entities like UUID, GUID, person fullname, address and so on. Linked with other sourcesβ17Updated last year
- List of entity resolution software and resources.β53Updated 10 months ago
- Ibis analytics, with Ibis (and more!)β20Updated 4 months ago
- β65Updated 6 months ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.β55Updated last month
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.β43Updated 5 years ago
- A curated list of dagster code snippets for data engineersβ53Updated 11 months ago
- Build your feature store with macros right within your dbt repositoryβ38Updated 2 years ago
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable froβ¦β27Updated 2 years ago
- A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.β21Updated 3 years ago
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise itβ26Updated 10 months ago
- quadipy is a python package to help transform structured data into RDF graph formatβ18Updated last year
- A collection of python utility functionsβ12Updated 7 months ago
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasetsβ¦β43Updated 3 years ago
- β36Updated 2 months ago
- dagster scikit-learn pipeline example.β44Updated last year
- An experimental Athena extension for DuckDB π€β53Updated last month
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observβ¦β125Updated 2 weeks ago
- Sord Data Fabric: A Vue 3 frontend with a Python WebSocket server, leveraging a distributed architecture with DeltaLake and DuckDB workerβ¦β18Updated last year
- Playground for using large language models into the Modern Data Stack for entity matchingβ106Updated last year
- scraping and querying documents for LLMsβ18Updated last month
- A tool to automatically infer columns data types in .csv filesβ35Updated 2 years ago
- A monorepo of many Rill example projectsβ33Updated 2 weeks ago
- Next generation compute platform for the post-modern data stackβ12Updated this week
- π A sweet and speedy code generator for dbt ποΈβ¨β25Updated 7 months ago
- CLI for running Airbyte sources & destinations locally without Airbyte serverβ31Updated this week
- A curated list of awesome SQLMesh resourcesβ25Updated 2 months ago
- Repo for orienting dbt users to the Dagster asset frameworkβ53Updated 2 years ago
- Python package for deduplication/entity resolution using active learningβ78Updated 5 months ago