apicrafter / metacrafter
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
β44Updated 8 months ago
Alternatives and similar repositories for metacrafter:
Users that are interested in metacrafter are comparing it to the libraries listed below
- Provide an easy way with Python to protect your data sources by searching its metadata. π‘οΈβ16Updated 2 weeks ago
- Registry of metadata identifier entities like UUID, GUID, person fullname, address and so on. Linked with other sourcesβ17Updated last year
- List of entity resolution software and resources.β63Updated last month
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.β57Updated 3 months ago
- A collection of python utility functionsβ11Updated 9 months ago
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable froβ¦β27Updated 2 years ago
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.β44Updated 5 years ago
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise itβ26Updated last year
- β69Updated last month
- Ibis analytics, with Ibis (and more!)β21Updated 6 months ago
- Python package for deduplication/entity resolution using active learningβ77Updated 7 months ago
- Graph Engine for Exploration and Searchβ40Updated last year
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasetsβ¦β44Updated 3 years ago
- quadipy is a python package to help transform structured data into RDF graph formatβ19Updated last year
- Record matching and entity resolution at scale in Sparkβ34Updated last year
- undatum: a command-line tool for data processing. Brings CSV simplicity to JSON lines and BSONβ47Updated 6 months ago
- Data Tools Subjective Listβ83Updated last year
- Sord Data Fabric: A Vue 3 frontend with a Python WebSocket server, leveraging a distributed architecture with DeltaLake and DuckDB workerβ¦β18Updated last year
- dotML is a light-weight semantic layer written in Python.β34Updated last year
- Utilities for creating ETL pipelines with maraβ37Updated 2 years ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observβ¦β134Updated 2 months ago
- A small Python module containing quick utility functions for standard ETL processes.β34Updated last week
- Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the sameβ¦β28Updated 2 years ago
- Data pipelines from re-usable componentsβ108Updated 2 years ago
- Assessing whether data from database complies with reference information.β42Updated 3 weeks ago
- portable Python ML-powered data botβ23Updated 6 months ago
- Build your feature store with macros right within your dbt repositoryβ38Updated 2 years ago
- JedAI-WebApp is a GUI that facilitates the execution of JedAI. JedAI is an open source, high scalability toolkit that offers out-of-the-bβ¦β23Updated last year
- β43Updated this week
- This repo contains information about DuckDB extensions found on GitHub. Refreshed dailyβ96Updated this week