petewarden / buzzprofilecrawlLinks
A simple script to crawl Google Profile pages and extract their information as structured data
☆90Updated 15 years ago
Alternatives and similar repositories for buzzprofilecrawl
Users that are interested in buzzprofilecrawl are comparing it to the libraries listed below
Sorting:
- A PHP module that incorporates all known APIs that map an email address to user information☆107Updated 15 years ago
- The reference implementation of the SPEAR ranking algorithm in Python.☆37Updated 9 years ago
- Export a graph of link between crawled items by scrapy in dot file format.☆26Updated 13 years ago
- Pretty fast parser for probabilistic context free grammars☆87Updated 12 years ago
- Visualizes search engine ranking algorithms for a given domain☆30Updated 14 years ago
- Lightweight, multilingual natural language processing☆63Updated 12 years ago
- A Python implementation of the Double Metaphone algorithm☆61Updated 14 years ago
- Common Crawl support library to access 2008-2012 crawl archives (ARC files)☆502Updated 7 years ago
- ... just because nltk is too heavy☆35Updated 14 years ago
- A command-line twitter client with smart filtering and statistical classification☆165Updated 14 years ago
- Faceted search engine for domain-specific exploration of the Web☆45Updated 8 years ago
- Adaptations and Extensions of Twitter-Related Examples from Mining the Social Web☆382Updated 11 years ago
- A web renderer for geographic heat maps, using OpenStreetMap compatible file formats☆103Updated 2 years ago
- Command line webpage screenshot and thubnail generator☆191Updated 3 years ago
- API that extracts metadata from a URL.☆28Updated 10 years ago
- Django framework for crowdsourcing complex tasks using MTurk☆64Updated 14 years ago
- A simple Python library/tool for pulling location information from unstructured text☆186Updated 14 years ago
- ☆36Updated last year
- Demo of the Newspaper article extraction library.☆29Updated 10 years ago
- A scholarly authoring and publishing platform based on WordPress.☆138Updated 2 years ago
- Powering the #replaceawordinafamousquotewithduck micro-site☆25Updated 6 years ago
- ur.ly is a URL-shortening web app built on the Google App Engine. The production version lived at http://ur.ly/ before I sold the domain …☆55Updated 4 years ago
- A Prudence-based web services API for the Goose HTML content extraction library☆38Updated 13 years ago
- CrisisTracker is an open-source web platform that extracts situation awareness reports from public tweets during humanitarian disasters. …☆69Updated 9 years ago
- PANDA: A Newsroom Data Appliance☆205Updated 2 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 4 years ago
- Open Source Social Media Monitoring And Engagement System Core/API☆36Updated 10 years ago
- Shit I should never forget☆46Updated 13 years ago
- Github contest☆40Updated 15 years ago
- Watching Twitter all day—so you don’t have to.☆175Updated 11 years ago