rcarmo / newsfeed-corpusLinks
A Dockerized RSS feed fetcher for NLP work, using asyncio
☆20Updated 2 years ago
Alternatives and similar repositories for newsfeed-corpus
Users that are interested in newsfeed-corpus are comparing it to the libraries listed below
Sorting:
- Save data from Google Takeout to a SQLite database☆112Updated 2 years ago
- DIY Atom feeds in times of social media and paywalls☆85Updated last year
- Aviation grade news article metadata extraction☆36Updated 2 years ago
- An OPML file with 22 of the top 25 US newspapers RSS feeds☆56Updated 6 years ago
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Updated 8 years ago
- Web RSS aggregator and reader compatible with the Fever API☆148Updated last year
- A Telegram bot that records whatever you send it to MongoDB☆29Updated 4 years ago
- The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!☆41Updated 8 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- linkbak is a web page archiver : it reads a list of links and dumps the corresponding pages in HTML and PDF.☆13Updated 2 years ago
- Personal Knowledge Management System. Capture your ideas using plain old text files. Make a journal that lasts 100 years.☆29Updated last year
- A dockerized, queued high fidelity web archiver based on Squidwarc☆61Updated last year
- Bookmark and archive webpages from the command line☆33Updated 6 years ago
- Grabbing all news.☆62Updated 5 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- One-Click User Instigated Preservation☆128Updated 6 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Lightweight web scraping toolkit for documents and structured data.☆314Updated last year
- BUNT is a Bot UNderstanding Testbed☆36Updated 8 years ago
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆131Updated 5 months ago
- Python library for reading and writing warc files☆244Updated 3 years ago
- Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")☆109Updated 6 months ago
- Tag-based bookmark manager inspired by delicious and Pinboard☆34Updated 2 years ago
- A Tree View For Tweets☆96Updated 3 years ago
- A Python utility for moving bookmarks/reading lists between services☆206Updated 9 years ago
- Now included in rigour☆151Updated 3 weeks ago
- Data validation as a service. Project retired, got to the current one at frictionsless/repository☆69Updated 2 years ago
- Analyze topics and trends in news with NLP☆48Updated 2 years ago
- a simple interface from extracting texts from (almost) any url☆53Updated 5 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆190Updated 3 years ago