dsynkov / newspaper-bulk
CLI to extract article contents in bulk using Newspaper3k and multithreading.
☆13Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for newspaper-bulk
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Pre-built template for using newspaper3k on aws lambda☆16Updated last year
- Visualize large text collections with WebGL☆25Updated 2 months ago
- Build intelligent data-driven applications with minimal effort. Sentence Clustering, Topics Extraction, Text Similarity, Opinion Summariz…☆40Updated 5 years ago
- Collection of code snippets and utilities for streamlit apps☆22Updated 4 years ago
- spaCy pipeline component for adding text readability meta data to Doc objects.☆56Updated 5 years ago
- Natural Language Generation for Gramex applications.☆24Updated 2 years ago
- Language Tool style grammar handling with spaCy 2.0☆42Updated 6 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆37Updated 5 years ago
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 2 years ago
- A visualisation tool for Spacy using Hierplane.☆65Updated last year
- Tag news stories based on models trained on the NYT corpus.☆40Updated last year
- A simple Flask & React app to demonstrate how to generate text with OpenAI's GPT-2☆52Updated last year
- Simple dashboard for getting currently trending hashtags and topics on Twitter☆25Updated last year
- A suite of tools for collecting, pre-processing, analyzing and sentiment-scoring twitter data☆23Updated 4 years ago
- Language detection using Spacy and Fasttext☆54Updated 11 months ago
- LNEx: Location Name Extractor☆24Updated 4 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆25Updated 2 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆32Updated last year
- A tool that is built using several open source services and uses Open AI's GPT-2 as a base model.☆4Updated last year
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- This repository provides usage examples for the Python module Newspaper3k.☆142Updated 10 months ago
- The News Landscape Toolkit (NELA)☆15Updated 4 years ago
- A conda-smithy repository for spacy.☆14Updated 2 weeks ago
- A raspberry pi 64bit image with spacy and neuralcoref pre-installed☆21Updated 5 years ago
- Interactive tree-maps with SBERT & Hierarchical Clustering (HAC)☆31Updated 6 months ago
- clustering news, extract trending news stories☆12Updated 3 years ago
- Cleans Reddit Text Data☆81Updated 4 years ago
- German sentiment scores with SentiWS as extension for spaCy☆36Updated last year
- ☆40Updated 9 years ago