DBeath / feedsearch-crawlerLinks
Crawl sites for RSS, Atom, and JSON feeds.
☆85Updated last month
Alternatives and similar repositories for feedsearch-crawler
Users that are interested in feedsearch-crawler are comparing it to the libraries listed below
Sorting:
- This repository provides usage examples for the Python module Newspaper3k.☆148Updated last year
- This is a proof-of-concept of using an LLM to find and extract meaningful data without parsing the html too much.☆30Updated 2 years ago
- Spider templates for automatic crawlers.☆32Updated this week
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆154Updated last month
- Extract text from HTML☆135Updated 5 years ago
- Add website scraping abilities to Datasette☆66Updated 2 years ago
- A Collection of Awesome Personal Search Engines and Related Projects☆20Updated 2 years ago
- Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.☆122Updated 7 years ago
- API interface to the Raindrop Bookmark Manager.☆41Updated this week
- A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file.☆40Updated 6 years ago
- Tool to index and serve HTML files. Powered by Datasette.☆110Updated 3 years ago
- The Python script for downloading new mp3 from RSS given channels☆140Updated 9 months ago
- A helper library full of URL-related heuristics.☆73Updated 2 months ago
- A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them☆72Updated 2 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆142Updated last month
- 🥐 Open-source LLM-friendly Markdown/JSON generator☆94Updated last week
- Python Module to use the Readwise API☆20Updated this week
- A News Article Collection Library☆22Updated 2 years ago
- 📖👓🏷Tag your getpocket.com articles automatically using natural language processing☆45Updated 6 years ago
- Parse government documents into well formed JSON☆74Updated last week
- Python wrapper for Raindrop.io API.☆43Updated 3 years ago
- A Python utility for moving bookmarks/reading lists between services☆205Updated 10 years ago
- Bookmarklet for multicolumn reader mode.☆18Updated last year
- Create high-quality images programmatically with easily-hackable templates.☆190Updated last year
- The most boring open source you've ever seen ....☆127Updated 2 years ago
- Python, Javascript, and Rust libraries for the Spider Cloud API.☆20Updated 3 weeks ago
- Small Python library to read metadata information from an ePub (2 and 3) file.☆45Updated last year
- Scrape various open data directories to create an index of what's available out there☆37Updated 10 months ago
- A lightweight transcript editor for editing and correcting STT generated timed transcripts☆52Updated last month
- Scrape Twitter API without authentication using Nitter.☆65Updated 3 years ago