mediacloud / feed_seekerLinks
Find rss, atom, xml, and rdf feeds on webpages
☆31Updated last month
Alternatives and similar repositories for feed_seeker
Users that are interested in feed_seeker are comparing it to the libraries listed below
Sorting:
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
- Examples for getting started using https://case.law☆69Updated 3 years ago
- A helper library full of URL-related heuristics.☆73Updated 3 months ago
- Inspect a URL and estimate if it contains a news story☆39Updated last week
- Classifying the content of domains☆58Updated 3 months ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Interactive and searchable House staffer directory, based on House disbursement data.☆30Updated last year
- A simple command line interface to the datamade/dedupe library.☆43Updated 2 years ago
- Deduplicate and parse list of `dirty names'☆23Updated 5 years ago
- Record Linkage ToolKit (Find and link entities)☆111Updated 2 years ago
- Data, analytic code, and findings supporting BuzzFeed News's analysis of fentanyl and cocaine overdose deaths.☆13Updated 3 years ago
- Parse government documents into well formed JSON☆75Updated last week
- A financial disclosure data extraction tool.☆18Updated 2 years ago
- The documentation and scripts for the Local News Dataset☆25Updated 3 years ago
- The core of sunlightlabs' Data Commons project. Includes the Transparency Data site and the APIs that power TransparencyData.com and Infl…☆38Updated 9 years ago
- The CorpWatch API uses automated parsers to extract the subsidiary relationship information from Exhibit 21 of companies' 10-K filings wi…☆49Updated 10 months ago
- Crawl sites for RSS, Atom, and JSON feeds.☆85Updated last month
- Now included in rigour☆152Updated last month
- ☆11Updated 6 years ago
- Predict age and gender from a first name☆59Updated 7 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated 2 years ago
- Package for performing Reddit-based text analysis☆20Updated 6 years ago
- Public client for consuming content from the Media Cloud Online News Archive & Directory.☆78Updated last month
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆58Updated last year
- A Python library for standardizing lists of names, especially database/CSV column–names.☆23Updated 6 years ago
- A collection of regular expressions for matching citations to state, federal, and even international law☆40Updated 4 years ago
- Add website scraping abilities to Datasette☆66Updated 2 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆191Updated 3 years ago
- Command-line and Python API to download PDFs directly from Sci-Hub☆13Updated last year
- Command-line utility to help researchers collect video metadata from Youtube API☆29Updated last year