edwardcooper / piidetectLinks
A package to build an end-to-end pipeline for detecting personally identifiable information from text.
☆45Updated 6 years ago
Alternatives and similar repositories for piidetect
Users that are interested in piidetect are comparing it to the libraries listed below
Sorting:
- Application and python script to identify, remove, and/or recode personally identifiable information (PII) from field experiment datasets…☆46Updated 3 years ago
- A research python package for detecting, categorizing, and assessing the severity of personal identifiable information (PII)☆87Updated last year
- Provide an easy way with Python to protect your data sources by searching its metadata. 🛡️☆17Updated last month
- Playground for using large language models into the Modern Data Stack for entity matching☆108Updated 2 years ago
- Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully custom…☆44Updated 11 months ago
- A project to build a machine learning pipeline to detect personal identifiable information (PII)☆16Updated 2 years ago
- Async bulk data ingestion and querying in various document, graph and vector databases via their Python clients☆36Updated last year
- This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire…☆221Updated this week
- Library for identification, anonymization and de-anonymization of PII data☆22Updated 2 years ago
- ☆33Updated 3 years ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆59Updated 2 weeks ago
- Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub☆316Updated last year
- ☆39Updated 4 months ago
- S3 vector database for LLM Agents and RAG.☆43Updated last year
- Python package for deduplication/entity resolution using active learning☆80Updated 10 months ago
- A small Python module containing quick utility functions for standard ETL processes.☆35Updated this week
- A fully-featured multi-source data pipeline for continuously extracting knowledge from COVID-19 data.☆21Updated 4 years ago
- ☆47Updated 2 years ago
- MirrorDataGenerator is a python tool that generates synthetic data based on user-specified causal relations among features in the data. I…☆23Updated 3 years ago
- ☆49Updated 2 weeks ago
- Language detection using Spacy and Fasttext☆55Updated last year
- ☆30Updated 3 years ago
- Package that returns a company embedding given a company name☆46Updated 5 years ago
- Streamlit application to explore Snowflake Tables☆41Updated last year
- Generating Realistic Synthetic Data☆38Updated last year
- Convert pandas DataFrame manipulations to sql query string☆45Updated 4 years ago
- A personal knowledge base that I can dump information to and help me learn☆24Updated last month
- dotML is a light-weight semantic layer written in Python.☆36Updated last year
- ☆57Updated 3 years ago
- Build and deploy a serverless data pipeline on AWS with no effort.☆111Updated 2 years ago