apicrafter / metacrafter
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
β44Updated 10 months ago
Alternatives and similar repositories for metacrafter
Users that are interested in metacrafter are comparing it to the libraries listed below
Sorting:
- Provide an easy way with Python to protect your data sources by searching its metadata. π‘οΈβ17Updated this week
- Registry of metadata identifier entities like UUID, GUID, person fullname, address and so on. Linked with other sourcesβ17Updated last year
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.β57Updated last month
- List of entity resolution software and resources.β66Updated 2 months ago
- Python package for deduplication/entity resolution using active learningβ79Updated 8 months ago
- β21Updated 8 months ago
- A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.β21Updated 3 years ago
- An experimental Athena extension for DuckDB π€β54Updated 4 months ago
- π¦ A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.β105Updated this week
- quadipy is a python package to help transform structured data into RDF graph formatβ19Updated 2 years ago
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable froβ¦β27Updated 2 years ago
- PyPi module for Graphlet AI Knowledge Graph Factoryβ29Updated 2 years ago
- Web based SQL query editor for your files, databases and cloud storage data.β30Updated 6 months ago
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clientsβ36Updated last year
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasetsβ¦β45Updated 3 years ago
- Graph Engine for Exploration and Searchβ40Updated last year
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.β10Updated 2 years ago
- β70Updated 2 months ago
- The Modern Data Stack in a Python packageβ49Updated last year
- SQL query executor on remote DuckDB instance using Apache Arrow Flight RPC through Streamlit Web interface.β14Updated 6 months ago
- Ibis analytics, with Ibis (and more!)β21Updated 7 months ago
- A package to build an end-to-end pipeline for detecting personally identifiable information from text.β45Updated 5 years ago
- Scripts to make specific datasets cleaner and more convenientβ41Updated 2 years ago
- Batteries included toolkit for data engineering.β34Updated 4 months ago
- Swiple enables you to easily observe, understand, validate and improve the quality of your dataβ83Updated this week
- Mapping of DWH database tables to business entities, attributes & metrics in Python, with automatic creation of flattened tablesβ74Updated last year
- Record matching and entity resolution at scale in Sparkβ34Updated last year
- Dash Component created from ukrbublik/react-awesome-query-builderβ12Updated last week
- Making Time Speak! ποΈβ29Updated 3 months ago
- asyncio bridge to the duckdb libraryβ41Updated 2 years ago