DBeath / feedsearch-crawlerLinks
Crawl sites for RSS, Atom, and JSON feeds.
☆75Updated last year
Alternatives and similar repositories for feedsearch-crawler
Users that are interested in feedsearch-crawler are comparing it to the libraries listed below
Sorting:
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 7 months ago
- Search sites for RSS, Atom, and JSON feeds.☆18Updated 2 years ago
- Extract text from HTML☆135Updated 4 years ago
- Presentations on Quantified Self and Self-Tracking with Python☆30Updated 2 years ago
- ☆13Updated 6 years ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆17Updated 2 months ago
- LLM plugin for embeddings using sentence-transformers☆65Updated last month
- A Python utility for moving bookmarks/reading lists between services☆204Updated 9 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆34Updated 2 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated last year
- 🥐 Open-source LLM-friendly Markdown/JSON generator☆88Updated last week
- Add website scraping abilities to Datasette☆62Updated 2 years ago
- Yet another tool to search through your (exported) ChatGPT conversations☆12Updated 8 months ago
- A Python library for standardizing lists of names, especially database/CSV column–names.☆23Updated 5 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated 6 months ago
- Datasette plugin for rendering Markdown☆29Updated last year
- ☆20Updated 2 years ago
- A Google Trends Analytics Package☆13Updated last year
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Updated last year
- A natural language date parser. (Python version of chrono.js)☆25Updated last week
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Feed discovery to share :)☆41Updated 8 years ago
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 7 years ago
- Parsing resumes in a PDF format from linkedIn☆68Updated 8 years ago
- admin ui for scrapy/open source scrapinghub☆58Updated 4 years ago
- Detect and classify pagination links☆103Updated 4 years ago
- Parse government documents into well formed JSON☆70Updated last week
- Save an RSS or ATOM feed to a SQLite database☆52Updated 2 years ago