mediacloud / date_guesser
A library to extract a publication date from a web page, along with a measure of the accuracy.
☆41Updated 5 years ago
Alternatives and similar repositories for date_guesser:
Users that are interested in date_guesser are comparing it to the libraries listed below
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- ☆30Updated 2 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated 3 months ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 5 months ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆67Updated 2 years ago
- Extract text from HTML☆134Updated 4 years ago
- A web application tagging and retrieval of arguments in text☆29Updated last year
- Topic Inference with Zeroshot models☆61Updated last year
- Python package aiding in entity disambiguation based on string and location matching☆18Updated last year
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated last year
- Python bindings to the Compact Language Detector☆33Updated 4 years ago
- Jupyter notebook + Code for reproducing Reddit Subreddit graphs☆17Updated 8 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Finds linguistic patterns effortlessly☆35Updated last year
- A spaCy wrapper for DBpedia Spotlight☆109Updated last year
- Python package for stylometry☆61Updated 3 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆190Updated 2 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- An index data structure for approximate string search.☆23Updated 5 years ago
- A visualisation tool for Spacy using Hierplane.☆65Updated 2 years ago
- NERtwork is a collection of scripts to help you create a network graph of co-occurring named entities using open source tools. This is do…☆48Updated 11 months ago
- MinScIE is an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations.☆15Updated 5 years ago
- ☆12Updated 5 years ago
- Socrates is a thin wrapper around an early-stage [AllenNLP](https://allennlp.org/) model that enables machine reading comprehension (MRC)…☆14Updated 4 years ago
- Trying to generate name synonyms from wikidata☆32Updated 4 years ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated last week
- sumgram is a tool that summarizes a collection of text documents by generating the most frequent sumgrams (conjoined ngrams)☆56Updated 7 months ago
- Add website scraping abilities to Datasette☆62Updated 2 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year