mediacloud / metadata-libLinks
How Media Cloud approaches extracting metadata from online news stories
☆15Updated 6 months ago
Alternatives and similar repositories for metadata-lib
Users that are interested in metadata-lib are comparing it to the libraries listed below
Sorting:
- Fast and robust date extraction from web pages, with Python or on the command-line☆133Updated 6 months ago
- A helper library full of URL-related heuristics.☆70Updated last month
- Ultimate Website Sitemap Parser☆222Updated 3 weeks ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆142Updated 6 months ago
- A webmining CLI tool & library for python.☆331Updated last month
- Build a site taxonomy from a list of keywords, provided via CSV file upload, or by connecting to a Google Search Console property☆33Updated 9 months ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆178Updated 6 months ago
- Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k se…☆153Updated last year
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 3 years ago
- A web scraper for TikTok using Playwright☆96Updated 2 months ago
- A Python scraper for the Facebook Ad Library, using the official Facebook Ad Library API.☆119Updated 5 years ago
- News crawling with StormCrawler - stores content as WARC☆351Updated 4 months ago
- advertools crawler UI☆28Updated 2 years ago
- This repository provides usage examples for the Python module Newspaper3k.☆147Updated last year
- Content Extraction using the PageRank algorithm to find the element containing the best content.☆12Updated 5 years ago
- Python port of Boilerpipe library☆88Updated 10 months ago
- A Python script to decode Google News article URLs.☆196Updated 2 months ago
- Python wrapper for google people-alos-ask☆107Updated 10 months ago
- A list of over 5000 US news domains and their social media accounts☆44Updated 2 years ago
- Article extraction benchmark: dataset and evaluation scripts☆318Updated last year
- Find "People Also Ask" questions☆60Updated 2 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated 2 years ago
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- Index Common Crawl archives in tabular format☆122Updated 2 months ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆285Updated last month
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Repo for Content for iCodeSEO.dev☆23Updated 4 years ago
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 5 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Google Search Results Pages Dashboard☆37Updated 2 years ago