mediacloud / metadata-lib
How Media Cloud approaches extracting metadata from online news stories
☆12Updated 2 months ago
Alternatives and similar repositories for metadata-lib:
Users that are interested in metadata-lib are comparing it to the libraries listed below
- Fast and robust date extraction from web pages, with Python or on the command-line☆122Updated last month
- Ultimate Website Sitemap Parser☆190Updated this week
- A helper library full of URL-related heuristics.☆64Updated 4 months ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆93Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Python port of Boilerpipe library☆86Updated 6 months ago
- A Flexible Deep Learning Approach to Fuzzy String Matching☆141Updated 4 months ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆133Updated last month
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆157Updated 2 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- A Directory of Online Newspaper Sources for 70+ Languages☆33Updated 3 years ago
- A list of over 5000 US news domains and their social media accounts☆43Updated 2 years ago
- Index Common Crawl archives in tabular format☆110Updated 3 months ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated last year
- Keyword spaCy is a spaCy pipeline component for extracting keywords from text using cosine similarity.☆11Updated last year
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 3 years ago
- Legal document classification with EuroVoc descriptors on 22 languages.☆25Updated last year
- A database of courts, tests and other experiments☆67Updated this week
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated last year
- A Python scraper for the Facebook Ad Library, using the official Facebook Ad Library API.☆118Updated 5 years ago
- Cloud crawler functions for scrapeulous☆45Updated 3 years ago
- an experimental implementation of Burrow's delta in Python 3☆20Updated 3 years ago
- Command-line utility to help researchers collect video metadata from Youtube API☆29Updated 5 months ago
- Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.☆148Updated 3 weeks ago
- Cross-platform GUI Client for Computer Vision APIs (Google Vision, Microsoft Cognitive Services, Clarifai and Keras' open source models)☆21Updated last year
- Blazing fast topic modelling for short texts.☆31Updated last month
- Inspect a URL and estimate if it contains a news story☆39Updated 3 months ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆166Updated last month
- Extracting addresses from text☆42Updated 6 years ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.☆117Updated 10 months ago