pudo/normality

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pudo/normality)

pudo / normality

A tiny library for Python text normalisation. Useful for ad-hoc text processing.

☆158

Alternatives and similar repositories for normality

Users that are interested in normality are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pudo / banal
View on GitHub
Commons of stupid, simple Python micro functions. Pull requests very welcome.
☆21Jun 20, 2026Updated last month
opensanctions / fingerprints
View on GitHub
Now included in rigour
☆150Nov 24, 2025Updated 8 months ago
opensanctions / datapatch
View on GitHub
A Python library for defining rule-based overrides on messy data
☆18Nov 24, 2025Updated 8 months ago
alexbyrnes / FCC-Political-Ads_The-Code
View on GitHub
Code for extracting data from a large number of PDFs, particularly FCC political ad documents
☆15Oct 26, 2017Updated 8 years ago
pudo-attic / archivekit
View on GitHub
ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.
☆15May 2, 2015Updated 11 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
deadlyforcedb / data-recipes
View on GitHub
A small repo of notes and scripts for collecting data on U.S. deadly force police incidents
☆10Aug 9, 2015Updated 10 years ago
pudo / prefixdate
View on GitHub
Provide partial dates and retain the date precision through processing
☆14Aug 4, 2025Updated 11 months ago
rflow / demystifying-d3
View on GitHub
Code for the NICAR 2014 d3 workshop
☆16Feb 27, 2014Updated 12 years ago
dataresearchcenter / investigraph
View on GitHub
etl pipeline, graphical explorer and general toolbox for investigations with follow the money data
☆28Jul 15, 2025Updated last year
okfn / helmut
View on GitHub
A generic Google Refine Reconciliation API implementation
☆20Jan 11, 2012Updated 14 years ago
openspending / spendb
View on GitHub
Next-gen web application for public finance data warehouses, formerly OpenSpending
☆57Jul 6, 2022Updated 4 years ago
The-Politico / us-elections
View on GitHub
US election metadata, packaged as python!
☆10Mar 16, 2022Updated 4 years ago
jfilter / clean-text
View on GitHub
🧹 Python package for text cleaning
☆1,026May 15, 2026Updated 2 months ago
anthonydb / CDC-flu-scraper
View on GitHub
Python scraper to get weekly CDC flu surveillance data
☆25Dec 2, 2014Updated 11 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
alexbyrnes / FCC-Political-Ads
View on GitHub
Archive of political ad data from the Federal Communications Commission
☆21Oct 25, 2017Updated 8 years ago
cjdd3b / citizen-quotes
View on GitHub
A project to demonstrate maximum entropy models for extracting quotes from news articles in Python.
☆26Aug 27, 2012Updated 13 years ago
opensanctions / rigour
View on GitHub
Data cleaning and validation functions for names, languages, identifiers, etc.
☆64Updated this week
alephdata / alephclient
View on GitHub
API client for Aleph, supports bulk entity and document upload.
☆30Mar 5, 2026Updated 4 months ago
alephdata / memorious
View on GitHub
Lightweight web scraping toolkit for documents and structured data.
☆316May 20, 2026Updated 2 months ago
pudo / jsongraph
View on GitHub
Little JSON object want to be graphs, too!
☆17Oct 2, 2015Updated 10 years ago
sannuta / news-atom-lite
View on GitHub
Extract structured events and atoms (sentence-level knowledge units) from news articles using any language model.
☆16May 12, 2026Updated 2 months ago
agussman / hrc-email
View on GitHub
Tools for analyzing the Hillary Clinton emails
☆13Apr 24, 2016Updated 10 years ago
pudo / datafreeze
View on GitHub
Dump (freeze) SQL query results from a database into a selection of file formats
☆92May 8, 2019Updated 7 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
nprapps / ucr-clearance-parser
View on GitHub
parse uniform crime reporting clearance data
☆13Oct 2, 2015Updated 10 years ago
guardian / giant
View on GitHub
Platform for journalists to search, analyse, categorise and share unstructured data
☆59Updated this week
opensanctions / nomenklatura
View on GitHub
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
☆259Updated this week
newsdev / nyt-docket
View on GitHub
A Python client for parsing SCOTUS cases from the granted/noted and orders dockets. https://pypi.python.org/pypi/nyt-docket
☆16Sep 28, 2017Updated 8 years ago
pudo / pgcsv
View on GitHub
Load CSV files into Postgres without explicit schema creation.
☆79Jun 26, 2021Updated 5 years ago
dataresearchcenter / datasets
View on GitHub
A collaborative collection of structured datasets and document collections that are common to use within "Follow the Money" investigation…
☆16May 13, 2026Updated 2 months ago
seanherron / sheeet
View on GitHub
Sheeet is a simple utility to take a number of Excel files and convert them to CSV.
☆20Feb 6, 2014Updated 12 years ago
newsdev / nyt-clerk
View on GitHub
A set of Python modules for downloading, parsing, and outputting data related to the Supreme Court.
☆40Jun 20, 2019Updated 7 years ago
allisson / python-preparer
View on GitHub
Simple way to build a new dict based on fields declaration
☆15May 7, 2019Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sunlightlabs / sotumachine
View on GitHub
State of the Unions for the rest of us
☆19Jan 16, 2015Updated 11 years ago
mtdukes / how-to
View on GitHub
A collection of cheat sheets for remembering common commands and tips for data journalism work.
☆39Oct 12, 2023Updated 2 years ago
washingtonpost / data-equitable-sharing-spending
View on GitHub
Obtained in December 2014 through a Freedom of Information request
☆15Jan 29, 2016Updated 10 years ago
stefanw / Bundestagger
View on GitHub
Django project for annotating and referencing parts of the parliament protocols of the German Bundestag.
☆15Jul 21, 2010Updated 16 years ago
lyeoni / prenlp
View on GitHub
Preprocessing Library for Natural Language Processing
☆164Dec 6, 2022Updated 3 years ago
newsdev / nyt-pyfec
View on GitHub
A Python library for downloading, parsing and cleaning Federal Election Commission filings.
☆28Jan 30, 2024Updated 2 years ago
blacklight / Takk
View on GitHub
Speech recognition in Python made easy and flexible
☆11Sep 12, 2015Updated 10 years ago