DBeath / feedsearch-crawler
Crawl sites for RSS, Atom, and JSON feeds.
☆59Updated 3 months ago
Related projects: ⓘ
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Search sites for RSS, Atom, and JSON feeds.☆17Updated last year
- Add website scraping abilities to Datasette☆59Updated last year
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated 7 months ago
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 4 years ago
- ☆23Updated this week
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆31Updated last year
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆35Updated this week
- Matrix-based News Aggregation to Explore Media Bias☆19Updated 6 years ago
- Extract text from HTML☆129Updated 4 years ago
- Use the Google Cloud Speech API to transcribe audio files from a podcast.☆20Updated 7 years ago
- Media Bias Fact Check extension☆35Updated this week
- ☆13Updated 5 years ago
- Python port of Boilerpipe library☆81Updated last month
- Python code to scrape and collect data from the RSS feeds Facebook uses to augment its Trending Section☆57Updated 5 years ago
- Presentations on Quantified Self and Self-Tracking with Python☆29Updated last year
- Building a Job Dataset☆21Updated 2 years ago
- Text analysis for automatic bookmarking/keyword extraction☆18Updated 7 years ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆118Updated 2 weeks ago
- Inspect a URL and estimate if it contains a news story☆39Updated 3 weeks ago
- Python package for converting xml and epubs to text files☆34Updated 4 years ago
- A repository demonstrating the use of real-estate-scrape to store the estimated value of a property on Redfin and Zillow every night usin…☆28Updated this week
- 💬NLP - Library for splitting email content into a human-written body and an automatically appended signature.☆23Updated 5 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Generate a list of your GitHub stars by topic - automatically!☆69Updated last year
- A News Article Collection Library☆22Updated last year
- Quantified Self: A Personal Data Aggregator and Dashboard for Self-Trackers and Quantified Self Enthusiasts☆17Updated last year
- A financial disclosure data extraction tool.☆13Updated last year
- ChatGPT Conversations to Markdown is a Python script that converts your exported ChatGPT conversations into readable and well-formatted M…☆31Updated last year