rcarmo / newsfeed-corpus
A Dockerized RSS feed fetcher for NLP work, using asyncio
☆20Updated 2 years ago
Alternatives and similar repositories for newsfeed-corpus:
Users that are interested in newsfeed-corpus are comparing it to the libraries listed below
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Save data from Google Takeout to a SQLite database☆107Updated last year
- An OPML file with 22 of the top 25 US newspapers RSS feeds☆55Updated 6 years ago
- Add website scraping abilities to Datasette☆62Updated last year
- Python code to scrape and collect data from the RSS feeds Facebook uses to augment its Trending Section☆57Updated 6 years ago
- Code + Jupyter Notebooks for Visualizing Clusters of Clickbait Headlines Using Spark, Word2vec, and Plotly☆47Updated 4 years ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 8 years ago
- Aviation grade news article metadata extraction☆36Updated last year
- a simple interface from extracting texts from (almost) any url☆52Updated 5 years ago
- python library for extracting html microdata☆166Updated last year
- An interface for interacting with MediaWiki☆37Updated 3 years ago
- My dotfiles for macOS and Linux☆19Updated last year
- Sandstorm package of Paperwork - OpenSource note-taking & archiving alternative to Evernote, Microsoft OneNote & Google Keep☆16Updated 5 years ago
- Paginating the web☆37Updated 10 years ago
- Tag-based bookmark manager inspired by delicious and Pinboard☆33Updated 2 years ago
- Primary LocalWiki backend server environment☆48Updated 7 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆57Updated 6 months ago
- The news homepage archive☆81Updated 3 years ago
- This is the HeadQuarters of my digital info. HPI library got me inspired and I'm trying to play with the idea on a smaller scale for myse…☆20Updated last year
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆43Updated 7 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆86Updated last year
- One-Click User Instigated Preservation☆123Updated 5 years ago
- A Telegram bot that records whatever you send it to MongoDB☆29Updated 3 years ago
- Create a SQLite database containing data from your Pocket account☆102Updated last year
- The "hyp.is" service that takes a user to a URL with Hypothesis activated☆49Updated 3 weeks ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Personal Knowledge Management System. Capture your ideas using plain old text files. Make a journal that lasts 100 years.☆28Updated last year
- An eBook tool to extract ISBN or Metadata form eBook and rename them by using ISBN database and Metadata☆30Updated 9 years ago