dsynkov / newspaper-bulk
CLI to extract article contents in bulk using Newspaper3k and multithreading.
☆13Updated 7 years ago
Alternatives and similar repositories for newspaper-bulk:
Users that are interested in newspaper-bulk are comparing it to the libraries listed below
- Build intelligent data-driven applications with minimal effort. Sentence Clustering, Topics Extraction, Text Similarity, Opinion Summariz…☆40Updated 5 years ago
- A data science cookiecutter for Nesta projects.☆10Updated last week
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 5 years ago
- Lightweight intelligent searching of elasticsearch data☆40Updated 4 years ago
- Pre-built template for using newspaper3k on aws lambda☆17Updated 2 years ago
- Reddit title generator API based on GPT-2☆19Updated 5 years ago
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 3 years ago
- ☆34Updated last year
- This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.☆11Updated 3 years ago
- semantically distinct key phrase extraction using hilbert hashes.☆48Updated 3 years ago
- Socrates is a thin wrapper around an early-stage [AllenNLP](https://allennlp.org/) model that enables machine reading comprehension (MRC)…☆14Updated 4 years ago
- A helper library full of URL-related heuristics.☆69Updated 3 weeks ago
- spaCy pipeline component for adding text readability meta data to Doc objects.☆56Updated 6 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 5 years ago
- A simple Flask & React app to demonstrate how to generate text with OpenAI's GPT-2☆53Updated 2 years ago
- A raspberry pi 64bit image with spacy and neuralcoref pre-installed☆21Updated 5 years ago
- Deployable NER with BERT served over HTTP API☆8Updated 4 years ago
- The News Landscape Toolkit (NELA)☆15Updated 4 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Python SDK for the TextRazor Text Analytics API☆20Updated last year
- Boolean text search in Python☆45Updated 2 years ago
- Reproducing "Writing with Transformer" demo, using aitextgen/FastAPI in backend, Quill/React in frontend☆28Updated 4 years ago
- ☆11Updated 5 years ago
- Finds linguistic patterns effortlessly☆36Updated last year
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Topic Inference with Zeroshot models☆61Updated last year
- Source real estate prices from the Common Crawl.☆27Updated 6 years ago
- Create a Geonames gazetteer index in Elasticsearch☆76Updated last year
- Python based Wikidata framework for easy dataframe extraction☆43Updated last year