petewarden / crunchcrawl
A project to gather, analyze and visualized the data in Crunchbase
☆46Updated 13 years ago
Alternatives and similar repositories for crunchcrawl:
Users that are interested in crunchcrawl are comparing it to the libraries listed below
- Pretty fast parser for probabilistic context free grammars☆87Updated 11 years ago
- Neddick: Open Source Information Discovery Platform☆36Updated last year
- Where 2.0 Workshop Code: Spatial Analysis of Tweets using Hadoop, Pig, Python & Mechanical Turk. Slides here: http://www.slideshare.net/…☆134Updated 14 years ago
- [not maintained] Custom Twitter Search via ElasticSearch&Wicket☆61Updated 4 years ago
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 2 years ago
- A simple system for archiving and OCRing documents built for cloud-friendly search and backup.☆22Updated 4 years ago
- Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2☆355Updated 13 years ago
- TweeQL is a Query Language for Tweets: SELECT brand(text) AS brand, sentiment(text) AS sentiment FROM twitter_sample;☆193Updated 10 years ago
- Hadoop library for large-scale data processing, now an Apache Incubator project☆583Updated 10 years ago
- Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.☆158Updated 2 years ago
- Fast and intuitive exploratory data analysis☆93Updated 9 years ago
- playing around with the common crawl dataset☆70Updated 12 years ago
- Github contest☆40Updated 15 years ago
- A Python wrapper for Cascading☆222Updated 5 years ago
- Jeremy's Machine Learning Library☆52Updated 8 years ago
- natural language processing with link-grammar☆18Updated 15 years ago
- Bulk loading for elastic search☆185Updated last year
- Ranked Prefix Search for Large Data on External Memory optimized for Mobile with ZERO lag initialization time☆16Updated 6 years ago
- Examples from my book "Scripting Intelligence: Web 3.0 Information Gathering and Processing"☆44Updated 4 years ago
- Streaming and REST server implementation☆45Updated 12 years ago
- Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.☆174Updated 12 years ago
- A RESTful web service that runs microtasks across multiple crowds, provides quality control techniques, and is easily extensible.☆51Updated 7 years ago
- NodeJS web analytics data collector for SnowPlow☆39Updated 10 years ago
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆57Updated 3 years ago
- Social Graph Analysis using Elastic MapReduce and PyPy☆54Updated 13 years ago
- Python-based utility for managing various distributed services on cloud providers☆63Updated 11 years ago
- A command-line twitter client with smart filtering and statistical classification☆165Updated 14 years ago
- Text classification using Naive Bayes and Elasticsearch☆154Updated 8 years ago