ivbeg / newsworkerLinks
Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds
☆81Updated last month
Alternatives and similar repositories for newsworker
Users that are interested in newsworker are comparing it to the libraries listed below
Sorting:
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆191Updated 3 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- A helper library full of URL-related heuristics.☆73Updated 3 months ago
- ☆62Updated last year
- Parses Firefox/Chrome HTML bookmarks files☆48Updated last year
- Extract text from HTML☆135Updated 5 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- Python library to read, write and convert data files with formats BSON, JSON, NDJSON, Parquet, ORC, XLS, XLSX, XML and many others☆21Updated last week
- API - extract a list of keywords from a text.☆18Updated 8 years ago
- Lightweight library that converts a HTML webpage to JSON data using a template defined in JSON.☆23Updated 7 months ago
- Firefox Web Extension to save Facebook posts as images☆22Updated 4 years ago
- A Python Package which helps to scrape all news details from any news websites☆219Updated 6 months ago
- Lazy helper tool to make easier scraping with simple tasks☆19Updated 3 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 7 years ago
- Aiohttp web server API, which scrapes Google and returns scrape results as response. Supports proxies, multiple geos and number of result…☆59Updated last year
- Scrapy middleware which allows to crawl only new content☆79Updated 2 weeks ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆142Updated last month
- Web scraping Page Objects core library☆104Updated 2 weeks ago
- Simple framework for building Instagram chat bots with menu driven interface☆18Updated 5 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Find rss, atom, xml, and rdf feeds on webpages☆31Updated last month
- Firefox and Chrome compatible extension that acts as annotation tool for websites (Named Entity Recognition)☆10Updated 6 years ago
- FBLYZE is a Facebook scraping system and analysis system.☆67Updated 4 years ago
- This repository provides usage examples for the Python module Newspaper3k.☆149Updated 2 years ago
- Detect and classify pagination links☆104Updated 2 weeks ago
- A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.☆120Updated last year
- Python library for scraping google search results☆116Updated last year
- Python client for Yandex.XML☆19Updated 2 years ago
- Paginating the web☆37Updated 11 years ago