dsynkov / newspaper-bulkLinks
CLI to extract article contents in bulk using Newspaper3k and multithreading.
☆13Updated 7 years ago
Alternatives and similar repositories for newspaper-bulk
Users that are interested in newspaper-bulk are comparing it to the libraries listed below
Sorting:
- Reddit title generator API based on GPT-2☆19Updated 5 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Source real estate prices from the Common Crawl.☆27Updated 6 years ago
- Visualize large text collections with WebGL☆25Updated 9 months ago
- Extracts key terminology (n-grams) from any large collection of documents (>1000) and forecasts emergence☆64Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆13Updated 3 months ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 3 years ago
- Create a Geonames gazetteer index in Elasticsearch☆77Updated last year
- A Foursquare data scraper that gathers all venues within a specified geographic area.☆39Updated 6 years ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 6 years ago
- Quora Question Scraper - Find & Export relevant Questions 10x faster☆16Updated 5 years ago
- A News Article Collection Library☆23Updated 2 years ago
- A simple Flask & React app to demonstrate how to generate text with OpenAI's GPT-2☆53Updated 2 years ago
- Build intelligent data-driven applications with minimal effort. Sentence Clustering, Topics Extraction, Text Similarity, Opinion Summariz…☆40Updated 5 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 5 years ago
- Language detection using Spacy and Fasttext☆55Updated last year
- A raspberry pi 64bit image with spacy and neuralcoref pre-installed☆21Updated 5 years ago
- Notebooks configured to be run with Binder, usually found on my blog.☆42Updated 2 years ago
- Language-agnostic political event coding using universal dependencies☆18Updated 6 years ago
- ☆11Updated 5 years ago
- Text analysis for automatic bookmarking/keyword extraction☆18Updated 8 years ago
- MinScIE is an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations.☆15Updated 6 years ago
- A browser user interface for manual labeling of record pairs.☆47Updated last year
- Extract text from HTML☆135Updated 4 years ago
- 🚀GUI for training spaCy models☆55Updated 4 years ago
- Dataframe Integration with spaCy.☆103Updated 4 years ago
- LNEx: Location Name Extractor☆25Updated 5 years ago