DBeath / feedsearch-crawler
Crawl sites for RSS, Atom, and JSON feeds.
☆62Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for feedsearch-crawler
- Search sites for RSS, Atom, and JSON feeds.☆18Updated last year
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated last month
- Spider templates for automatic crawlers.☆24Updated this week
- A Python library for finding feed links on websites.☆50Updated 2 years ago
- ☆13Updated 5 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated 9 months ago
- python api wrapper for https://mercury.postlight.com/web-parser/☆23Updated last year
- Aiohttp web server API, which scrapes Google and returns scrape results as response. Supports proxies, multiple geos and number of result…☆53Updated 9 months ago
- Scrapy middleware which allows to crawl only new content☆79Updated 2 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆118Updated last year
- Asyncio web crawling framework. Work in progress.☆18Updated 3 months ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Quickly download and scrape websites on a massive scale.☆63Updated 12 years ago
- Generate a list of your GitHub stars by topic - automatically!☆71Updated last year
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Paginating the web☆37Updated 10 years ago
- ☆74Updated last year
- Parsing resumes in a PDF format from linkedIn☆66Updated 8 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Renders Day One app entries as HTML☆8Updated 7 years ago
- Scrapy middleware for the autologin☆37Updated 6 years ago
- 🥐 Open-source RSS feed generator for Google Sheets.☆74Updated last month
- Common crawl extractor☆69Updated 6 months ago
- A natural language date parser. (Python version of chrono.js)☆25Updated 5 months ago
- A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.☆109Updated 9 months ago
- Search for words, documents, images, videos, news and maps using the Brave search engine. Downloading files and images to a local hard dr…☆43Updated 6 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆33Updated last week
- A maximum-strength name parser for record linkage.☆34Updated 3 months ago