mediacloud / metadata-libLinks
How Media Cloud approaches extracting metadata from online news stories
☆14Updated 6 months ago
Alternatives and similar repositories for metadata-lib
Users that are interested in metadata-lib are comparing it to the libraries listed below
Sorting:
- Fast and robust date extraction from web pages, with Python or on the command-line☆130Updated 5 months ago
- A polite and user-friendly downloader for Common Crawl data☆48Updated last month
- A helper library full of URL-related heuristics.☆69Updated 2 weeks ago
- Python port of Boilerpipe library☆88Updated 10 months ago
- A list of over 5000 US news domains and their social media accounts☆45Updated 2 years ago
- Newsfeed based on GDELT Project☆28Updated last year
- Extract networks of entities from journalistic reporting☆48Updated last year
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆141Updated 5 months ago
- Index Common Crawl archives in tabular format☆122Updated last month
- Command-line utility to help researchers collect video metadata from Youtube API☆29Updated 10 months ago
- Now included in rigour☆151Updated last month
- This repository provides usage examples for the Python module Newspaper3k.☆147Updated last year
- Scraper for Facebook's Archive of Ads with Political Content☆37Updated 6 years ago
- A Python scraper for the Facebook Ad Library, using the official Facebook Ad Library API.☆119Updated 5 years ago
- Keyword spaCy is a spaCy pipeline component for extracting keywords from text using cosine similarity.☆11Updated last year
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 3 years ago
- A tool for detecting viruses and NSFW material in WARC files☆15Updated 10 months ago
- Ultimate Website Sitemap Parser☆221Updated last week
- A classifier that distinguishes political from non-political news articles.☆30Updated last year
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 2 months ago
- Tools for auditing autocomplete on Google and Bing☆24Updated last week
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Measure the readability of a given text using surface characteristics☆78Updated 4 months ago
- TikTok Content Scraper -- No API-Key needed, minimal dependencies, citable | Download videos (MP4), slides (JPEG) and metadata of author,…☆27Updated 3 weeks ago
- Legal Matter Standard Specification (LMSS) library for Python☆15Updated last year
- ☆25Updated 2 years ago
- Scrapers for U.S. county court sites.☆69Updated 2 years ago
- Collector for Facebook's Political Ad API☆31Updated 2 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- A financial disclosure data extraction tool.☆16Updated last year