ivbeg / newsworkerLinks
Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds
☆80Updated 2 years ago
Alternatives and similar repositories for newsworker
Users that are interested in newsworker are comparing it to the libraries listed below
Sorting:
- Lazy helper tool to make easier scraping with simple tasks☆18Updated 2 years ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 8 months ago
- Универсальный парсер деклараций в формат для передачи в Декларатор.☆18Updated 7 months ago
- Quick and dirty date parsing Python library to parse HTML dates really fast☆21Updated last year
- Parses Firefox/Chrome HTML bookmarks files☆49Updated last year
- Python library to read, write and convert data files with formats BSON, JSON, NDJSON, Parquet, ORC, XLS, XLSX and XML☆16Updated last month
- Aiohttp web server API, which scrapes Google and returns scrape results as response. Supports proxies, multiple geos and number of result…☆56Updated last year
- Russian names parsers, gender identification and processing tools☆131Updated last year
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- VK dumper - save almost everything from your vk.com wall, documents, etc☆43Updated 6 years ago
- Python client for Yandex.XML☆19Updated 2 years ago
- Scrape VK media☆57Updated last year
- Readability.io public code☆41Updated 9 years ago
- Bot for forwarding updates from RSS/Atom feeds to Telegram☆57Updated last week
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Scrapy: примеры и полезная информация собранная участниками telegram чата @scrapy_python☆79Updated last year
- Telegram bot forwarding messages to the inbox☆140Updated last week
- Repository for ru-syntax command line tool.☆16Updated 3 years ago
- ☆62Updated last year
- Proxy collector☆150Updated 2 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 7 years ago
- Broad crawler for domain discovery☆19Updated 7 years ago
- Extracts tables from .docx files and saves them as .csv or .xls files☆63Updated last year
- Scrapy middleware which allows to crawl only new content☆79Updated 2 years ago
- Poetry tools and russian text parser☆8Updated 8 years ago
- Extract text from HTML☆134Updated 4 years ago
- Paginating the web☆37Updated 11 years ago
- Russian data and parsers from database of registry of repression victims (http://lists.memo.ru/)☆11Updated 3 years ago