apache / nutchLinks
Apache Nutch is an extensible and scalable web crawler
☆3,121Updated this week
Alternatives and similar repositories for nutch
Users that are interested in nutch are comparing it to the libraries listed below
Sorting:
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆3,178Updated this week
- Open Source Web Crawler for Java☆4,628Updated 4 years ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,091Updated 5 months ago
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,517Updated 2 weeks ago
- A scalable web crawler framework for Java.☆11,698Updated last month
- A scalable, mature and versatile web crawler based on Apache Storm☆959Updated this week
- A configurable web spider with a easy-to-use web console☆998Updated 7 years ago
- Apache ActiveMQ☆2,414Updated this week
- Mirror of Apache HttpClient☆1,524Updated last week
- Apache Storm☆6,672Updated this week
- Ehcache 3.x line☆2,078Updated 2 weeks ago
- Apache Mahout - an environment for quickly creating scalable, performant machine learning applications.☆2,204Updated this week
- Apache Lucene and Solr open-source search software☆4,370Updated last year
- Apache HBase☆5,581Updated this week
- Do not send pull requests! Automated Git clone of various OpenJDK branches☆2,142Updated 5 years ago
- Elasticsearch Java Rest Client.☆2,108Updated 2 years ago
- cglib - Byte Code Generation Library is high level API to generate and transform Java byte code. It is used by AOP, testing, data access …☆4,889Updated last year
- Apache Curator☆3,170Updated last week
- When jsoup meets XPath.☆473Updated last week
- Apache Commons Lang☆2,932Updated this week
- Apache Shiro☆4,434Updated this week
- Apache log4j1☆867Updated 3 years ago
- a mature, highly concurrent JDBC Connection pooling library, with support for caching and reuse of PreparedStatements.☆1,316Updated last month
- Eclipse Jetty® - Web Container & Clients - supports HTTP/3, HTTP/2, HTTP/1, websocket, servlets, and more☆4,052Updated this week
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,559Updated last year
- Enterprise Stream Process Engine☆3,889Updated 2 years ago
- Mirror of Apache MINA☆916Updated 2 weeks ago
- ZooKeeper client wrapper and rich ZooKeeper framework☆2,138Updated 2 years ago
- The reliable, generic, fast and flexible logging framework for Java.☆3,206Updated this week
- Apache Struts is a free, open-source, MVC framework for creating elegant, modern Java web applications☆1,345Updated this week