rcarmo / newsfeed-corpusLinks
A Dockerized RSS feed fetcher for NLP work, using asyncio
☆20Updated 3 years ago
Alternatives and similar repositories for newsfeed-corpus
Users that are interested in newsfeed-corpus are comparing it to the libraries listed below
Sorting:
- An OPML file with 22 of the top 25 US newspapers RSS feeds☆56Updated 7 years ago
- RSS feed reader for Python 3☆88Updated 3 years ago
- Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")☆110Updated last month
- The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!☆41Updated 8 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 10 years ago
- Python code to scrape and collect data from the RSS feeds Facebook uses to augment its Trending Section☆57Updated 7 years ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆157Updated 3 months ago
- Aviation grade news article metadata extraction☆36Updated 2 years ago
- Primary LocalWiki backend server environment☆47Updated 7 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆191Updated 3 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated 2 years ago
- Web RSS aggregator and reader compatible with the Fever API☆147Updated last year
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Updated 4 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Tag-based bookmark manager inspired by delicious and Pinboard☆34Updated 3 years ago
- A Tree View For Tweets☆97Updated 3 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- An eBook tool to extract ISBN or Metadata form eBook and rename them by using ISBN database and Metadata☆29Updated 10 years ago
- One-Click User Instigated Preservation☆129Updated 6 years ago
- Rewriting web proxy and archival tool. At this point, it just tries to download all the things.☆203Updated this week
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 6 years ago
- Simple to use python library for Buffer App☆23Updated 3 years ago
- Paginating the web☆37Updated 11 years ago
- Create and deploy a RESTful API with a few lines of YAML☆32Updated 7 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆92Updated 2 months ago
- A Telegram bot that records whatever you send it to MongoDB☆29Updated 4 years ago
- Python library for reading and writing warc files☆246Updated 3 years ago
- 🗄 Bot powering the @LinkArchiver Twitter tool to send tweeted URLs to the Wayback Machine☆46Updated 8 years ago
- Tool for real-time scraping of news articles.☆39Updated 6 years ago
- Slackbot for stock prices☆15Updated 9 years ago