petewarden / crunchcrawl
A project to gather, analyze and visualized the data in Crunchbase
☆46Updated 13 years ago
Alternatives and similar repositories for crunchcrawl:
Users that are interested in crunchcrawl are comparing it to the libraries listed below
- [not maintained] Custom Twitter Search via ElasticSearch&Wicket☆59Updated 4 years ago
- Where 2.0 Workshop Code: Spatial Analysis of Tweets using Hadoop, Pig, Python & Mechanical Turk. Slides here: http://www.slideshare.net/…☆134Updated 15 years ago
- A restful web application for real-time typeahead and autocomplete☆105Updated 12 years ago
- Bulk loading for elastic search☆184Updated last year
- Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.☆158Updated 2 years ago
- Rails app for tracking trends in server logs - powered by the Cloudera Hadoop Distribution on EC2☆355Updated 13 years ago
- Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.☆174Updated 12 years ago
- Java implementation of a probabilistic set data structure☆143Updated 7 years ago
- Pretty fast parser for probabilistic context free grammars☆87Updated 12 years ago
- The reference implementation of the SPEAR ranking algorithm in Python.☆37Updated 9 years ago
- KEA 5.0 (keyphrase extraction software), modified to be an XML-RPC service☆42Updated 13 years ago
- TweeQL is a Query Language for Tweets: SELECT brand(text) AS brand, sentiment(text) AS sentiment FROM twitter_sample;☆192Updated 10 years ago
- A Hadoop toolkit for web-scale information retrieval research☆83Updated 10 years ago
- Examples of use of pig scripting languages capabilities☆39Updated 8 years ago
- Realtime Analytics☆68Updated 12 years ago
- Crux is a reporting application for HBase. Crux provides a simple web based graphical interface to access HBase, query data and create re…☆100Updated 12 years ago
- example code for "Large-scale social media analysis with Hadoop" tutorial presented at ICWSM 2010☆42Updated 14 years ago
- Hadoop library for large-scale data processing, now an Apache Incubator project☆583Updated 10 years ago
- ☆116Updated 13 years ago
- A collection of datasets and databases☆24Updated 6 years ago
- A scrapy-based Hacker News crawler.☆151Updated 11 years ago
- Some utilities for Lucene☆110Updated 11 years ago
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆25Updated 16 years ago
- Example code for running R on Hadoop☆132Updated 12 years ago
- trying shingling / resemblance / simhash / sketching to do some data deduping☆98Updated 9 years ago
- Jeremy's Machine Learning Library☆52Updated 9 years ago
- A web renderer for geographic heat maps, using OpenStreetMap compatible file formats☆103Updated last year
- It counts☆61Updated 12 years ago
- An extension to PostgreSQL allowing Kyoto Cabinets to be used as a backing data store.☆54Updated 7 months ago
- Fast and intuitive exploratory data analysis☆96Updated 9 years ago