ivbeg / newsworker
Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds
☆79Updated 2 years ago
Alternatives and similar repositories for newsworker:
Users that are interested in newsworker are comparing it to the libraries listed below
- Lazy helper tool to make easier scraping with simple tasks☆18Updated 2 years ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 4 months ago
- Quick and dirty date parsing Python library to parse HTML dates really fast☆20Updated last year
- Russian names parsers, gender identification and processing tools☆129Updated last year
- Bot for forwarding updates from RSS/Atom feeds to Telegram☆55Updated last month
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Универсальный парсер деклараций в формат для передачи в Декларатор.☆18Updated 3 months ago
- Search sites for RSS, Atom, and JSON feeds.☆19Updated 2 years ago
- Extract text from HTML☆133Updated 4 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆55Updated last year
- Telegram bot forwarding messages to the inbox☆139Updated last week
- Python wrapper for Ferret☆41Updated 3 years ago
- Parses Firefox/Chrome HTML bookmarks files☆49Updated 10 months ago
- Firefox and Chrome compatible extension that acts as annotation tool for websites (Named Entity Recognition)☆10Updated 5 years ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated last year
- Scrapy: примеры и полезная информация собранная участниками telegram чата @scrapy_python☆79Updated last year
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆53Updated 7 months ago
- Python library to read, write and convert data files with formats BSON, JSON, NDJSON, Parquet, ORC, XLS, XLSX and XML☆16Updated 5 months ago
- Python client for Yandex.XML☆18Updated last year
- NoSQL extract, transform, load (ETL) toolkit with Python☆12Updated 3 months ago
- Proxy collector☆150Updated 2 years ago
- Broad crawler for domain discovery☆19Updated 6 years ago
- Extracts tables from .docx files and saves them as .csv or .xls files☆61Updated last year
- Web scraping Page Objects core library☆96Updated this week
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆23Updated 4 years ago
- A helper library full of URL-related heuristics.☆64Updated 4 months ago
- Простая обертка на языке Python для яндексового Tomita Parser'а (больше не нужна, Яндекс открыл исходники)☆17Updated 9 years ago
- Simple summarize ML model☆15Updated 6 years ago