johnbumgarner / newshound
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.
☆33Updated 2 years ago
Alternatives and similar repositories for newshound:
Users that are interested in newshound are comparing it to the libraries listed below
- A News Article Collection Library☆22Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆37Updated 5 years ago
- Neural Elastic Inference and Search☆19Updated 5 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 4 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- Extract dates from text☆64Updated 4 years ago
- This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.☆11Updated 3 years ago
- Text classification automl☆21Updated 3 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- Finds linguistic patterns effortlessly☆35Updated last year
- A list of over 5000 US news domains and their social media accounts☆45Updated 2 years ago
- A financial disclosure data extraction tool.☆14Updated last year
- A classifier that distinguishes political from non-political news articles.☆30Updated last year
- Newsfeed based on GDELT Project☆23Updated 10 months ago
- An open-source NLP library: fast text cleaning and preprocessing☆23Updated 3 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 3 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- an experimental implementation of Burrow's delta in Python 3☆21Updated 3 years ago
- Code and data for Teddy https://arxiv.org/abs/2001.05171.☆15Updated 2 years ago
- Bots for reviewing the credibility of web content: articles, tweets, sentences and websites☆9Updated 2 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆124Updated 2 months ago
- Interpretable feature construction from taxonomies for text classification☆18Updated 2 years ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction☆24Updated 2 years ago
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 3 years ago