DBeath / feedsearch-crawlerLinks
Crawl sites for RSS, Atom, and JSON feeds.
☆78Updated last week
Alternatives and similar repositories for feedsearch-crawler
Users that are interested in feedsearch-crawler are comparing it to the libraries listed below
Sorting:
- Generate a list of your GitHub stars by topic - automatically!☆83Updated 2 years ago
- This repository provides usage examples for the Python module Newspaper3k.☆147Updated last year
- Fast and robust date extraction from web pages, with Python or on the command-line☆138Updated last month
- A helper library full of URL-related heuristics.☆70Updated 2 months ago
- Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.☆105Updated 7 years ago
- LLM plugin for embeddings using sentence-transformers☆70Updated 4 months ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.☆116Updated last year
- 🥐 Open-source LLM-friendly Markdown/JSON generator☆93Updated 3 weeks ago
- Extract text from HTML☆134Updated 5 years ago
- Add website scraping abilities to Datasette☆64Updated 2 years ago
- 📖👓🏷Tag your getpocket.com articles automatically using natural language processing☆45Updated 6 years ago
- The Python script for downloading new mp3 from RSS given channels☆134Updated 5 months ago
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- Library that helps use puppeteer in scrapy.☆52Updated last month
- A Collection of Awesome Personal Search Engines and Related Projects☆19Updated 2 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- Article extraction benchmark: dataset and evaluation scripts☆321Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
- Spider templates for automatic crawlers.☆31Updated 2 months ago
- Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai☆40Updated 2 years ago
- A News Article Collection Library☆22Updated 2 years ago
- Create high-quality images programmatically with easily-hackable templates.☆188Updated 11 months ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆190Updated 3 years ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated 11 months ago
- Scrape various open data directories to create an index of what's available out there☆37Updated 6 months ago
- Python wrapper for google people-alos-ask☆107Updated 11 months ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated 2 years ago
- A curated list of awesome twitter tools☆226Updated last year