mediacloud / feed_seeker
Find rss, atom, xml, and rdf feeds on webpages
☆30Updated 5 months ago
Alternatives and similar repositories for feed_seeker:
Users that are interested in feed_seeker are comparing it to the libraries listed below
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- A financial disclosure data extraction tool.☆14Updated last year
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 4 years ago
- Presentations on Quantified Self and Self-Tracking with Python☆29Updated 2 years ago
- Sidewall is a Python library for interacting with the Dimensions search API.☆17Updated 6 months ago
- ☆12Updated 5 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆13Updated 3 weeks ago
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆17Updated 2 weeks ago
- A classifier that distinguishes political from non-political news articles.☆30Updated last year
- Examples for getting started using https://case.law☆65Updated 2 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- A Docker image for the CLIFF geolocation software.☆15Updated 3 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- A maximum-strength name parser for record linkage.☆36Updated last month
- Inspect a URL and estimate if it contains a news story☆39Updated 3 months ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆55Updated 8 months ago
- A Python library for defining rule-based overrides on messy data☆13Updated 4 months ago
- The Web Scraping Sandbox☆14Updated 2 months ago
- Source for lemon-model.net☆11Updated 3 years ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated 6 months ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- scraper for facebook, gab, google and tiktok☆22Updated 8 months ago
- A Google Trends Analytics Package☆13Updated 9 months ago
- Scrape various open data directories to create an index of what's available out there☆36Updated last month
- Named-Entity Recognition extension for OpenRefine☆26Updated 2 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago