b-cube / nutch-crawler
Apache Nutch fork tunned for web services and data discovery.
☆9Updated 9 years ago
Related projects ⓘ
Alternatives and complementary repositories for nutch-crawler
- Topic modeling web application☆39Updated 9 years ago
- S3 backed ContentsManager for jupyter notebooks☆13Updated 8 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 8 years ago
- Ductile DB is a graph database based on Hadoop/HBase which provides a vast set of features.☆13Updated 6 years ago
- Vizlinc☆14Updated 8 years ago
- Library for building reproducible data pipelines to support experimentation☆20Updated 8 years ago
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆57Updated 3 years ago
- A dataset downloaded from the deep and scientific web across three major Polar data centers for use in research.☆13Updated 7 years ago
- Elasticsearch Latent Semantic Indexing experimentation☆33Updated 5 years ago
- An example project for doing grid search in MLlib☆13Updated 9 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆36Updated 7 months ago
- PMML evaluator library for the PostgreSQL database (http://www.postgresql.org/)☆11Updated 9 years ago
- Solr for Astrophysics Data System☆52Updated last month
- ☆24Updated 9 years ago
- ☆36Updated 9 years ago
- Task Orchestration Tool Based on SWF and boto3☆38Updated 6 years ago
- Distributed version restore tool for S3☆12Updated 9 years ago
- A polite, minimal interface for sending python objects to and from Amazon S3.☆57Updated 8 years ago
- Proposals for new Jupyter subprojects to enter into incubation☆18Updated 4 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- Seldon Spark Jobs☆26Updated 9 years ago
- Task scheduling and blocked algorithms for parallel processing☆17Updated 6 months ago
- Set of Hadoop, Spark and Storm based tools for web and customer analytic☆34Updated 3 years ago
- distributed latent dirichlet allocation☆30Updated 12 years ago
- a set of services that provide NLP facilities☆25Updated 3 years ago