mediacloud / feed_seeker
Find rss, atom, xml, and rdf feeds on webpages
☆30Updated 6 months ago
Alternatives and similar repositories for feed_seeker:
Users that are interested in feed_seeker are comparing it to the libraries listed below
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 5 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Some tools to help analyze the twitter archive☆62Updated 8 months ago
- ☆11Updated 5 years ago
- A demonstration transnational register of beneficial ownership data from the UK, Denmark, Slovakia and Armenia☆17Updated 6 months ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆13Updated 2 months ago
- A browser extension providing Open Access bibliographical services☆17Updated 2 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- Sidewall is a Python library for interacting with the Dimensions search API.☆17Updated 7 months ago
- Materials to reproduce findings in our story, "Google’s Top Search Result? Surprise! It’s Google"☆34Updated 4 years ago
- Presentations on Quantified Self and Self-Tracking with Python☆30Updated 2 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated 5 months ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Personal news feed: search for results on Reddit/Pinboard/Twitter/Hackernews and read as RSS☆31Updated last month
- Datasette plugin providing instructions for exporting data to Jupyter or Observable☆12Updated last year
- A maximum-strength name parser for record linkage.☆37Updated this week
- Automatically exported from code.google.com/p/guess-language☆53Updated last year
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago
- NLRB data scraper by LexPredict☆12Updated 2 years ago
- Wikidata properties☆9Updated last year
- Jupyter notebook + Code for reproducing Reddit Subreddit graphs☆18Updated 8 years ago
- Simple tools for summarizing .mbox email archives.☆11Updated 5 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Deduplicate and parse list of `dirty names'☆21Updated 4 years ago
- A Python library for defining rule-based overrides on messy data☆13Updated 3 weeks ago
- Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds☆80Updated 2 years ago
- Service for creating Twitter datasets for research and archiving.☆26Updated 2 years ago