apache / nutch
Apache Nutch is an extensible and scalable web crawler
☆3,005Updated 3 weeks ago
Alternatives and similar repositories for nutch:
Users that are interested in nutch are comparing it to the libraries listed below
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆2,955Updated this week
- Open Source Web Crawler for Java☆4,584Updated 3 years ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,072Updated 3 months ago
- A scalable web crawler framework for Java.☆11,542Updated 3 weeks ago
- A scalable, mature and versatile web crawler based on Apache Storm☆907Updated this week
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,513Updated last year
- Apache Curator☆3,135Updated this week
- Apache ActiveMQ Classic☆2,349Updated 2 weeks ago
- Ehcache 3.x line☆2,045Updated 3 months ago
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆6,002Updated this week
- Apache ZooKeeper☆12,458Updated this week
- Apache Lucene and Solr open-source search software☆4,376Updated 7 months ago
- High performance non-blocking webserver☆3,636Updated last month
- Do not send pull requests! Automated Git clone of various OpenJDK branches☆2,162Updated 4 years ago
- cglib - Byte Code Generation Library is high level API to generate and transform Java byte code. It is used by AOP, testing, data access …☆4,846Updated 8 months ago
- Apache Tomcat☆7,783Updated last week
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,985Updated 5 months ago
- Mirror of Apache HttpClient☆1,487Updated this week
- Code for Quartz Scheduler☆6,472Updated last week
- Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on…☆6,302Updated this week
- Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.☆5,561Updated this week
- Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more☆3,920Updated this week
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,550Updated 6 months ago
- Enterprise Stream Process Engine☆3,895Updated last year
- Apache Shiro☆4,365Updated this week
- BTrace - a safe, dynamic tracing tool for the Java platform☆5,879Updated this week
- Apache Lucene open-source search software☆2,928Updated this week
- A configurable web spider with a easy-to-use web console☆994Updated 6 years ago
- Provides support to increase developer productivity in Java when using Redis, a key-value store. Uses familiar Spring concepts such as a …☆1,796Updated this week
- Apache HBase☆5,328Updated this week