mediacloud / feed_seekerLinks
Find rss, atom, xml, and rdf feeds on webpages
☆30Updated 10 months ago
Alternatives and similar repositories for feed_seeker
Users that are interested in feed_seeker are comparing it to the libraries listed below
Sorting:
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated 9 months ago
- Examples for getting started using https://case.law☆67Updated 2 years ago
- Deduplicate and parse list of `dirty names'☆23Updated 4 years ago
- Classifying the content of domains☆57Updated 2 years ago
- A collection of projects I did while at General Assembly Singapore - as part of Data Science Immersive☆11Updated 4 years ago
- ☆11Updated 6 years ago
- A helper library full of URL-related heuristics.☆70Updated 2 months ago
- A maximum-strength name parser for record linkage.☆38Updated 2 months ago
- Automatically exported from code.google.com/p/guess-language☆52Updated last year
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Personal news feed: search for results on Reddit/Pinboard/Twitter/Hackernews and read as RSS☆33Updated last month
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆14Updated 5 months ago
- America's most comprehensive dictionary of campaign finance jargon. A free resource created by and for data journalists.☆17Updated 3 weeks ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- The documentation and scripts for the Local News Dataset☆25Updated 3 years ago
- Presentations on Quantified Self and Self-Tracking with Python☆30Updated 2 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆191Updated 3 years ago
- Crawl sites for RSS, Atom, and JSON feeds.☆77Updated last year
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated 11 months ago
- Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds☆80Updated 2 years ago
- The shared repository for Media Cloud web apps (Explorer, Source Manager, Topic Mapper)☆65Updated last year
- Record Linkage ToolKit (Find and link entities)☆110Updated 2 years ago
- NLRB data scraper by LexPredict☆12Updated 2 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated 2 years ago
- Various functions to make bag-of-words approaches to text analysis more user-friendly☆24Updated 8 years ago
- An R Package for Building Books or Documents using pandoc☆10Updated 3 years ago
- Parse government documents into well formed JSON☆72Updated 2 weeks ago
- Now included in rigour☆151Updated last week