mediacloud / metadata-libLinks
How Media Cloud approaches extracting metadata from online news stories
☆13Updated 5 months ago
Alternatives and similar repositories for metadata-lib
Users that are interested in metadata-lib are comparing it to the libraries listed below
Sorting:
- Fast and robust date extraction from web pages, with Python or on the command-line☆127Updated 5 months ago
- A helper library full of URL-related heuristics.☆69Updated 2 months ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆138Updated 5 months ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆34Updated 2 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- Keyword spaCy is a spaCy pipeline component for extracting keywords from text using cosine similarity.☆11Updated last year
- A list of over 5000 US news domains and their social media accounts☆45Updated 2 years ago
- A Python scraper for the Facebook Ad Library, using the official Facebook Ad Library API.☆119Updated 5 years ago
- Common crawl extractor☆75Updated last year
- Command-line utility to help researchers collect video metadata from Youtube API☆29Updated 9 months ago
- Ultimate Website Sitemap Parser☆219Updated last month
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆161Updated 2 years ago
- A set of jupyter notebooks demonstrating how to use the Media Cloud API.☆37Updated 3 weeks ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated last month
- A News Article Collection Library☆22Updated 2 years ago
- Remove DIVs, style stuff and normalize HTML preserving structure information☆11Updated 3 months ago
- Inspect a URL and estimate if it contains a news story☆39Updated 6 months ago
- Legal document classification with EuroVoc descriptors on 22 languages.☆26Updated last year
- An EUR-Lex parser for Python.☆30Updated 11 months ago
- Build a site taxonomy from a list of keywords, provided via CSV file upload, or by connecting to a Google Search Console property☆31Updated 8 months ago
- A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.☆79Updated 2 years ago
- A verification “Swiss army knife” helping journalists, fact-checkers, and human rights defenders to save time and be more efficient in th…☆35Updated this week
- LegalCrawler: A tool for automated scraping of English legal corpora☆55Updated 2 years ago
- API client for Aleph, supports bulk entity and document upload.☆28Updated 7 months ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆145Updated 7 months ago
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 3 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆160Updated 2 weeks ago
- Pushshift Telegram Ingest☆86Updated 5 years ago