johnbumgarner / newshound
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around the world in over 50 languages.
☆33Updated 2 years ago
Alternatives and similar repositories for newshound:
Users that are interested in newshound are comparing it to the libraries listed below
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- A News Article Collection Library☆22Updated 2 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- Measure the readability of a given text using surface characteristics☆78Updated 2 months ago
- Detecting gibberish as a type of sentiment analysis with GPT2☆24Updated 4 years ago
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated 2 years ago
- ☆10Updated 5 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 5 years ago
- Extract dates from text☆64Updated 4 years ago
- A list of over 5000 US news domains and their social media accounts☆44Updated 2 years ago
- NLG Best Practices for Data-Efficient Modeling How to Train Production-Ready Models with Little Data☆10Updated 3 years ago
- MoodCat😼 classifies the mood of English sentences.☆14Updated 2 years ago
- The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques☆29Updated 4 years ago
- News API - fetch news from CommonCrawl, parse with NewsPlease, enrich with pre-trained machine-learning models, to structured searchable …☆28Updated 2 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- Reddit title generator API based on GPT-2☆19Updated 5 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- A utility for labeling clusters of text data.☆28Updated 3 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 4 years ago
- Rust python bindings for symspell☆19Updated last year
- Parse government documents into well formed JSON☆68Updated last month
- A collection of code, data and information related to our audit of TikTok.☆21Updated last month
- ☆30Updated 2 years ago
- This repository contains code and data download instructions for the workshop paper "Improving Hierarchical Product Classification using …☆17Updated 3 years ago
- A financial disclosure data extraction tool.☆15Updated last year
- Text analysis for automatic bookmarking/keyword extraction☆18Updated 8 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- scraper for facebook, gab, google and tiktok☆21Updated 8 months ago
- Code examples for Google Natural Language API.☆13Updated 5 years ago