jamesturk / spatula
A modern Python library for writing maintainable web scrapers.
☆245Updated 6 months ago
Alternatives and similar repositories for spatula:
Users that are interested in spatula are comparing it to the libraries listed below
- Find your broken links, so users don't.☆66Updated 2 weeks ago
- ⛏ a library for scraping unreliable pages☆210Updated 4 months ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆146Updated last week
- The data journalism platform with built in training☆305Updated last month
- General programming utilities from Pew Research Center☆69Updated 2 years ago
- ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.☆99Updated last year
- Guess gender from first name in Python 2 and 3☆131Updated 2 years ago
- A Python module for accessing the Open States API☆29Updated last year
- Provide partial dates and retain the date precision through processing☆13Updated 2 years ago
- A general purpose tool for text-based crosswalking☆103Updated 9 months ago
- Text and statistics utilities from Pew Research Center☆82Updated 2 years ago
- Easily download U.S. census maps☆33Updated last year
- Python library and CLI you can use to move relational data from one place to another - DBs/CSV/gsheets/dataframes/...☆37Updated 7 months ago
- A new python implementation of an old classic☆14Updated 2 years ago
- Lightweight web scraping toolkit for documents and structured data.☆310Updated last year
- Fuzzy matches and merging of datasets in pandas using csvmatch☆74Updated 4 years ago
- A Python implementation of Lunr.js 🌖☆194Updated 2 weeks ago
- Clean up all those Pythons crawling around your computer☆15Updated last year
- Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.☆106Updated 2 months ago
- a python parser for the .fec file format☆44Updated last year
- A maximum-strength name parser for record linkage.☆36Updated 5 months ago
- Group thousands of similar spreadsheet or database text entries in seconds☆156Updated last year
- Tools for generating CSV and other flat versions of the structured data☆105Updated last week
- Opinionated cookiecutter template for creating a new Python library☆186Updated 4 months ago
- A light-weight wrapper for the Datawrapper API.☆61Updated 6 months ago
- Clean US addresses following USPS pub 28 and RESO guidelines☆208Updated 11 months ago
- Opinionated coding guidelines and best practices in Python☆207Updated 4 years ago
- Add website scraping abilities to Datasette☆62Updated last year
- 📚 Doing all sorts of things, the DataMade way☆92Updated last week
- Utility library to turn country names into ISO two-letter codes☆66Updated last month