dsynkov / newspaper-bulk
CLI to extract article contents in bulk using Newspaper3k and multithreading.
☆13Updated 6 years ago
Alternatives and similar repositories for newspaper-bulk:
Users that are interested in newspaper-bulk are comparing it to the libraries listed below
- Build intelligent data-driven applications with minimal effort. Sentence Clustering, Topics Extraction, Text Similarity, Opinion Summariz…☆40Updated 5 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆93Updated last year
- Pre-built template for using newspaper3k on aws lambda☆16Updated 2 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 4 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- GraphiPy: Universal Social Data Extractor☆81Updated 2 years ago
- Visualize large text collections with WebGL☆25Updated 5 months ago
- Google News Scraper for languages like Japanese, Chinese... [VPN Support]☆96Updated 3 years ago
- 🚀GUI for training spaCy models☆54Updated 3 years ago
- A visualisation tool for Spacy using Hierplane.☆65Updated 2 years ago
- A TextBlob sentiment analysis pipeline component for spaCy.☆56Updated 4 months ago
- A simple Flask & React app to demonstrate how to generate text with OpenAI's GPT-2☆53Updated 2 years ago
- Use ML-Annotate to label data for machine learning purposes☆107Updated 4 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated last year
- Finds linguistic patterns effortlessly☆35Updated last year
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated last year
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- A News Article Collection Library☆22Updated last year
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 5 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- ☆18Updated 3 years ago
- This is the frontend layer of SearchX. SearchX is a scalable collaborative search system being developed by Lambda Lab of TU Delft.☆14Updated last year
- Scalable String Similarity Joins in Python☆38Updated 7 months ago
- Excel Integration with spaCy. Training NER using Excel/XLSX from PDF, DOCX, PPT, PNG or JPG.☆105Updated 2 years ago
- Scraping Assisted by Learning☆35Updated last month
- A Python library for creating adversarial splits☆13Updated 2 years ago
- Python clients for Zyte AutoExtract API☆40Updated 3 years ago