DBeath / feedsearch-crawlerLinks
Crawl sites for RSS, Atom, and JSON feeds.
☆90Updated 2 weeks ago
Alternatives and similar repositories for feedsearch-crawler
Users that are interested in feedsearch-crawler are comparing it to the libraries listed below
Sorting:
- A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.☆352Updated last year
- Find rss, atom, xml, and rdf feeds on webpages☆31Updated 3 months ago
- A lightweight transcript editor for editing and correcting STT generated timed transcripts☆54Updated last month
- Fast and robust date extraction from web pages, with Python or on the command-line☆145Updated 3 months ago
- The Python script for downloading new mp3 from RSS given channels☆141Updated 11 months ago
- Generate a list of your GitHub stars by topic - automatically!☆102Updated 3 years ago
- This is a proof-of-concept of using an LLM to find and extract meaningful data without parsing the html too much.☆30Updated 2 years ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆158Updated last month
- 🥐 Open-source LLM-friendly Markdown/JSON generator☆96Updated last month
- Wikidata's QRank as a SQLite DB.☆28Updated 2 years ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆298Updated 8 months ago
- Yet another tool to search through your (exported) ChatGPT conversations☆13Updated last month
- A Collection of Awesome Personal Search Engines and Related Projects☆20Updated 3 years ago
- Ethical, legal, and effortless extraction of Reddit data in your database☆92Updated this week
- Search for words, documents, images, videos, news and maps using the Brave search engine. Downloading files and images to a local hard dr…☆79Updated 7 months ago
- link archive for year 2024☆18Updated 9 months ago
- Unofficial Otter.ai Python API☆82Updated 2 months ago
- Article extraction benchmark: dataset and evaluation scripts☆351Updated 4 months ago
- ☆13Updated 6 years ago
- A microservice for document conversion at scale☆97Updated 2 weeks ago
- Save an RSS or ATOM feed to a SQLite database☆57Updated 3 months ago
- 📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.☆137Updated this week
- This repository provides usage examples for the Python module Newspaper3k.☆151Updated 2 years ago
- Extract text from HTML☆134Updated 2 weeks ago
- https://verdad.app☆86Updated last week
- Read all emails and store into CVS through gmail API using python☆16Updated 7 years ago
- Add website scraping abilities to Datasette☆66Updated 2 years ago
- A helper library full of URL-related heuristics.☆73Updated this week
- Readable YouTube Transcripts using Gemini 1.5 Flash 8B☆64Updated 8 months ago
- A Python utility for moving bookmarks/reading lists between services☆205Updated 10 years ago