apicrafter / metacrafterLinks
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
β45Updated 3 weeks ago
Alternatives and similar repositories for metacrafter
Users that are interested in metacrafter are comparing it to the libraries listed below
Sorting:
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.β66Updated 2 weeks ago
- Provide an easy way with Python to protect your data sources by searching its metadata. π‘οΈβ18Updated last week
- Python+VueJS application to load, explore, combine,transform and deliver dataβ102Updated 11 months ago
- List of entity resolution software and resources.β107Updated 11 months ago
- Swiple enables you to easily observe, understand, validate and improve the quality of your dataβ84Updated this week
- PyPi module for Graphlet AI Knowledge Graph Factoryβ33Updated 2 years ago
- undatum: a command-line tool for data processing. Brings CSV simplicity to NDJSON, BSON, XML and other data filesβ51Updated last week
- Convert monolithic Jupyter notebooks π into maintainable Ploomber pipelines. πβ79Updated last year
- Python package for deduplication/entity resolution using active learningβ83Updated last year
- A curated list of example code to collect data from Web APIs using DataPrep.Connector.β36Updated 2 years ago
- Data pipelines from re-usable componentsβ107Updated 2 months ago
- π¦ A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.β110Updated 2 weeks ago
- Toolkit for graph-relational data across space and timeβ118Updated this week
- β22Updated 2 weeks ago
- ODD Specification is a universal open standard for collecting metadata.β146Updated last year
- A tool to automatically infer columns data types in .csv filesβ37Updated 3 years ago
- An automation tool to refactor Jupyter Notebooks to Python modules, with code dependency analysis.β12Updated 11 months ago
- Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).β120Updated 4 months ago
- Python Data Anonymization & Masking Library For Data Science Tasksβ282Updated 2 years ago
- A maximum-strength name parser for record linkage.β39Updated 4 months ago
- A monorepo of many Rill example projectsβ47Updated last week
- Scripts to make specific datasets cleaner and more convenientβ42Updated 3 years ago
- Data Tools Subjective Listβ89Updated 2 years ago
- π A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)β141Updated 2 years ago
- A small Python module containing quick utility functions for standard ETL processes.β37Updated last month
- Tutorials for Fugue - A unified interface for distributed computing. Fugue executes SQL, Python, and Pandas code on Spark and Dask withouβ¦β114Updated 2 months ago
- Cloud-agnostic Python APIβ60Updated last year
- β92Updated last year
- dagster scikit-learn pipeline example.β46Updated 2 years ago
- This repo contains information about DuckDB extensions found on GitHub. Refreshed dailyβ107Updated this week