ivbeg / newsworker
Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds
☆80Updated 2 years ago
Alternatives and similar repositories for newsworker:
Users that are interested in newsworker are comparing it to the libraries listed below
- Lazy helper tool to make easier scraping with simple tasks☆18Updated 2 years ago
- Project on text topics evolution over time analysis☆81Updated 2 years ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 6 months ago
- Quick and dirty date parsing Python library to parse HTML dates really fast☆21Updated last year
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Bot for forwarding updates from RSS/Atom feeds to Telegram☆56Updated 3 months ago
- Russian names parsers, gender identification and processing tools☆129Updated last year
- Python client for Yandex.XML☆19Updated 2 years ago
- Python library to read, write and convert data files with formats BSON, JSON, NDJSON, Parquet, ORC, XLS, XLSX and XML☆16Updated 2 weeks ago
- Универсальный парсер деклараций в формат для передачи в Декларатор.☆18Updated 5 months ago
- VK dumper - save almost everything from your vk.com wall, documents, etc☆43Updated 6 years ago
- Classification and aggregation of russian news articles. University coursework.☆17Updated 6 years ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Parses Firefox/Chrome HTML bookmarks files☆49Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Telegram bot forwarding messages to the inbox☆139Updated this week
- Russian Text Expansion based on ruGPT3Large☆25Updated 2 years ago
- Comparing quality and performance of NLP systems for Russian language☆47Updated last year
- Simple framework for building Instagram chat bots with menu driven interface☆18Updated 4 years ago
- VK-Top is used for getting popular posts of any public available page at VK.com☆39Updated 2 years ago
- Scrape and parse Google search results in Python☆31Updated last year
- Extract text from HTML☆135Updated 4 years ago
- Web scraping Page Objects core library☆99Updated 2 months ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Looks for new flats in Odessa and notify in the telegram through the bot☆53Updated 2 years ago
- Read It Later for Telegram☆83Updated 7 years ago
- metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)☆29Updated 8 months ago
- Registry of data portals, catalogs, data repositories including data catalogs dataset and catalog description standard☆41Updated 3 months ago
- A professional-grade text randomizer and ad generator by Airat Halitov — perfect for creating unique, human-readable content at scale.☆23Updated 3 weeks ago
- Broad crawler for domain discovery☆19Updated 6 years ago