DBeath / feedsearch-crawlerLinks
Crawl sites for RSS, Atom, and JSON feeds.
☆76Updated last year
Alternatives and similar repositories for feedsearch-crawler
Users that are interested in feedsearch-crawler are comparing it to the libraries listed below
Sorting:
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated 8 months ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- ☆13Updated 6 years ago
- Extract text from HTML☆134Updated 4 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆130Updated 5 months ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- LLM plugin for embeddings using sentence-transformers☆66Updated 2 months ago
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- A natural language date parser. (Python version of chrono.js)☆25Updated 3 weeks ago
- This repository provides usage examples for the Python module Newspaper3k.☆147Updated last year
- This is a proof-of-concept of using an LLM to find and extract meaningful data without parsing the html too much.☆29Updated 2 years ago
- Search sites for RSS, Atom, and JSON feeds.☆18Updated 2 years ago
- Yet another tool to search through your (exported) ChatGPT conversations☆12Updated 8 months ago
- Generate a list of your GitHub stars by topic - automatically!☆78Updated 2 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆141Updated 5 months ago
- Measure the readability of a given text using surface characteristics☆78Updated 5 months ago
- Add website scraping abilities to Datasette☆63Updated 2 years ago
- 📖 Using deep learning and scraping to analyze/summarize articles! Just drop in any URL!☆19Updated 2 years ago
- Python code to scrape and collect data from the RSS feeds Facebook uses to augment its Trending Section☆57Updated 6 years ago
- Scrape Twitter API without authentication using Nitter.☆63Updated 2 years ago
- 🥐 Open-source LLM-friendly Markdown/JSON generator☆88Updated last month
- Article extraction benchmark: dataset and evaluation scripts☆317Updated last year
- API interface to the Raindrop Bookmark Manager.☆36Updated last week
- ES Local Indexer - Desktop search powered by Elasticsearch☆27Updated 5 years ago
- Python port of Boilerpipe library☆88Updated 10 months ago
- Didactic Web crawler for Web Search Engines (CS 6913) course at NYU☆11Updated 2 years ago
- Search for words, documents, images, videos, news and maps using the Brave search engine. Downloading files and images to a local hard dr…☆62Updated last year
- The Selenium scraper that collected a million stories from Medium.com☆80Updated 6 years ago