dsynkov / newspaper-bulk
CLI to extract article contents in bulk using Newspaper3k and multithreading.
☆13Updated 6 years ago
Alternatives and similar repositories for newspaper-bulk:
Users that are interested in newspaper-bulk are comparing it to the libraries listed below
- A simple Flask & React app to demonstrate how to generate text with OpenAI's GPT-2☆52Updated 2 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 5 years ago
- ETL of newspaper article keywords using Apache Airflow, Newspaper3k, Quilt T4 and AWS S3☆15Updated 2 months ago
- Pre-built template for using newspaper3k on aws lambda☆16Updated 2 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated last year
- Build intelligent data-driven applications with minimal effort. Sentence Clustering, Topics Extraction, Text Similarity, Opinion Summariz…☆40Updated 5 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆32Updated last year
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆37Updated 5 years ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 3 months ago
- A (relatively) minimal configuration app to run Twitter bots on a schedule that can scale to unlimited bots.☆77Updated 3 years ago
- Releases for the reddit-graph project☆18Updated 6 months ago
- Visualize large text collections with WebGL☆25Updated 4 months ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- The project proposes a framework to apply topic models on a text-corpus and eventually topic labels on the generated topics.☆35Updated 8 months ago
- Cleans Reddit Text Data☆81Updated 4 years ago
- Package for performing Reddit-based text analysis☆20Updated 5 years ago
- A classifier that distinguishes political from non-political news articles.☆28Updated last year
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 3 years ago
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframe☆25Updated 4 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆13Updated last year
- A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019☆28Updated 5 years ago
- Visual analytics application for qualitative text analysis☆24Updated 2 years ago
- Examples for getting started using https://case.law☆65Updated 2 years ago
- Stylometric framework in Python☆13Updated 9 years ago
- A raspberry pi 64bit image with spacy and neuralcoref pre-installed☆21Updated 5 years ago
- Package that returns a company embedding given a company name☆42Updated 4 years ago
- A Google Trends Analytics Package☆13Updated 7 months ago
- Topic modelling with SpaCy, Gensim and Textacy☆19Updated 6 years ago