apache / nutch
Apache Nutch is an extensible and scalable web crawler
☆2,988Updated 2 months ago
Alternatives and similar repositories for nutch:
Users that are interested in nutch are comparing it to the libraries listed below
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆2,932Updated this week
- Open Source Web Crawler for Java☆4,579Updated 3 years ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,075Updated 2 months ago
- Apache Lucene and Solr open-source search software☆4,374Updated 5 months ago
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,509Updated last year
- Ehcache 3.x line☆2,037Updated 2 months ago
- Code for Quartz Scheduler☆6,436Updated 2 weeks ago
- A scalable web crawler framework for Java.☆11,518Updated last month
- Mirror of Apache Mahout☆2,162Updated last week
- Enterprise Stream Process Engine☆3,901Updated last year
- Apache Lucene open-source search software☆2,883Updated this week
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,545Updated 5 months ago
- Apache log4j1☆873Updated 2 years ago
- Apache Curator☆3,135Updated this week
- cglib - Byte Code Generation Library is high level API to generate and transform Java byte code. It is used by AOP, testing, data access …☆4,837Updated 7 months ago
- Do not send pull requests! Automated Git clone of various OpenJDK branches☆2,164Updated 4 years ago
- When jsoup meets XPath.☆468Updated last year
- A scalable, mature and versatile web crawler based on Apache Storm☆903Updated this week
- Apache Shiro☆4,359Updated this week
- Apache Storm☆6,614Updated this week
- Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more☆3,900Updated this week
- No longer maintained. Please contact the origional author.☆662Updated 6 years ago
- Apache HBase☆5,307Updated this week
- The official MongoDB drivers for Java, Kotlin, and Scala☆2,628Updated this week
- A configurable web spider with a easy-to-use web console☆994Updated 6 years ago
- Redis Java client☆11,993Updated this week
- Java binary serialization and cloning: fast, efficient, automatic☆6,264Updated this week
- Mirror of Apache HttpClient☆1,484Updated this week
- Elasticsearch Java Rest Client.☆2,116Updated last year
- a mature, highly concurrent JDBC Connection pooling library, with support for caching and reuse of PreparedStatements.☆1,299Updated last month