DBeath / feedsearch-crawlerLinks
Crawl sites for RSS, Atom, and JSON feeds.
☆77Updated last year
Alternatives and similar repositories for feedsearch-crawler
Users that are interested in feedsearch-crawler are comparing it to the libraries listed below
Sorting:
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 9 months ago
- ☆13Updated 6 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 7 years ago
- Generate a list of your GitHub stars by topic - automatically!☆78Updated 2 years ago
- 🥐 Open-source LLM-friendly Markdown/JSON generator☆90Updated last month
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- API interface to the Raindrop Bookmark Manager.☆37Updated last week
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆142Updated 6 months ago
- Telegram > OpenAI > Read Later [instapaper/pocket/omnivore]☆17Updated 2 years ago
- A helper library full of URL-related heuristics.☆70Updated last month
- Add website scraping abilities to Datasette☆64Updated 2 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- LLM plugin for embeddings using sentence-transformers☆69Updated 2 months ago
- Tag news stories based on models trained on the NYT corpus.☆42Updated 2 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 3 months ago
- 💡✏️️ ⬇️️ JSON to Markdown converter - Generate Markdown from format independent JSON☆71Updated 6 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆133Updated 6 months ago
- A lightweight transcript editor for editing and correcting STT generated timed transcripts☆46Updated 2 months ago
- The Python script for downloading new mp3 from RSS given channels☆128Updated 4 months ago
- script that generates an rss feed out of websites that don't have one☆31Updated 6 years ago
- Presentations on Quantified Self and Self-Tracking with Python☆30Updated 2 years ago
- Yet another tool to search through your (exported) ChatGPT conversations☆12Updated 9 months ago
- Bookmarklet for multicolumn reader mode.☆17Updated last year
- A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.☆116Updated last year
- Quantified Self: A Personal Data Aggregator and Dashboard for Self-Trackers and Quantified Self Enthusiasts☆17Updated 2 years ago
- A News Article Collection Library☆22Updated 2 years ago
- Airtable backup script package☆23Updated 3 years ago
- Advanced news feeds extractor and finder library. Helps to automatically extract news from websites without RSS/ATOM feeds☆80Updated 2 years ago