jplusplus / statscraper
A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.
☆13Updated last month
Alternatives and similar repositories for statscraper:
Users that are interested in statscraper are comparing it to the libraries listed below
- API client for Aleph, supports bulk entity and document upload.☆28Updated 6 months ago
- Materials to reproduce findings in our story, "Google’s Top Search Result? Surprise! It’s Google"☆34Updated 4 years ago
- Service for creating Twitter datasets for research and archiving.☆26Updated 2 years ago
- An alpha project combining beneficial ownership and contracting data☆13Updated 3 years ago
- How Quartz used AI to help reporters search the Mauritius Leaks☆47Updated 5 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- scraper for facebook, gab, google and tiktok☆21Updated 9 months ago
- Ask questions about government data.☆37Updated 6 years ago
- Jupyter notebook + Code for reproducing Reddit Subreddit graphs☆17Updated 8 years ago
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 10 years ago
- ☆11Updated 5 years ago
- OpenRefine for Social Science Data☆24Updated this week
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆21Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- ☆14Updated 8 years ago
- Docker Container for a Make-based, PDF extraction using OCR☆12Updated 8 months ago
- Scraping Assisted by Learning☆35Updated last week
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 6 months ago
- Repository for the research into radical and extremist infospheres on YouTube☆60Updated 6 years ago
- ☆23Updated 9 years ago
- Data and scripts relating to the publishing of the House expenditure reports, and hopefully the Senate's in future.☆24Updated 4 years ago
- A tool to allow US addresses to be geocoded/georeferenced easily, without using Python or the command line or paid services or anything.☆18Updated 2 years ago
- Python wrapper for a C++ Double Metaphone☆15Updated 2 years ago
- Research-grade URL expansion for Python.☆27Updated 6 years ago
- A Python library for defining rule-based overrides on messy data☆13Updated 4 months ago
- A curated list of resources for (aspiring) data journalists☆24Updated 4 years ago
- A financial disclosure data extraction tool.☆15Updated last year
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 4 years ago
- A curated list of awesome data sources related to elections, electoral reforms, and democratic political systems.☆75Updated 3 years ago