apache / nutch
Apache Nutch is an extensible and scalable web crawler
☆2,960Updated last week
Alternatives and similar repositories for nutch:
Users that are interested in nutch are comparing it to the libraries listed below
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆2,875Updated this week
- Open Source Web Crawler for Java☆4,570Updated 3 years ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,074Updated this week
- Apache Lucene and Solr open-source search software☆4,371Updated 3 months ago
- A scalable web crawler framework for Java.☆11,473Updated 2 weeks ago
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,505Updated 10 months ago
- Apache Tomcat☆7,667Updated this week
- A scalable, mature and versatile web crawler based on Apache Storm☆896Updated this week
- Apache ZooKeeper☆12,338Updated 2 weeks ago
- Ehcache 3.x line☆2,034Updated this week
- Apache Maven core☆4,442Updated this week
- Code for Quartz Scheduler☆6,373Updated last month
- Apache HBase☆5,264Updated this week
- Apache Storm☆6,609Updated this week
- Apache Curator☆3,124Updated this week
- Apache ActiveMQ Classic☆2,331Updated last week
- Mirror of Apache Mahout☆2,154Updated last month
- Apache Commons Lang☆2,762Updated this week
- Redis Java client☆11,935Updated this week
- Apache Struts is a free, open-source, MVC framework for creating elegant, modern Java web applications☆1,307Updated this week
- Jsoup学习笔记。添加了部分学习代码和注释。☆638Updated last year
- JAVA WEB + ORM Framework☆3,243Updated this week
- ZooKeeper client wrapper and rich ZooKeeper framework☆2,149Updated last year
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,980Updated last month
- Do not send pull requests! Automated Git clone of various OpenJDK branches☆2,163Updated 4 years ago
- Capturing JVM- and application-level metrics. So you know what's going on.☆7,839Updated this week
- Mirror of Apache HttpClient☆1,475Updated last week
- A configurable web spider with a easy-to-use web console☆991Updated 6 years ago
- Enterprise Stream Process Engine☆3,908Updated last year
- A code generator for MyBatis.☆5,297Updated last week