apache / nutchLinks
Apache Nutch is an extensible and scalable web crawler
☆3,037Updated 3 months ago
Alternatives and similar repositories for nutch
Users that are interested in nutch are comparing it to the libraries listed below
Sorting:
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆2,990Updated last week
- Open Source Web Crawler for Java☆4,594Updated 3 years ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,073Updated 5 months ago
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,514Updated last year
- A scalable web crawler framework for Java.☆11,578Updated last month
- Apache ActiveMQ Classic☆2,366Updated last week
- A scalable, mature and versatile web crawler based on Apache Storm☆919Updated this week
- cglib - Byte Code Generation Library is high level API to generate and transform Java byte code. It is used by AOP, testing, data access …☆4,857Updated 10 months ago
- Code for Quartz Scheduler☆6,527Updated 2 months ago
- Ehcache 3.x line☆2,049Updated last month
- Apache HBase☆5,351Updated this week
- Apache Curator☆3,138Updated this week
- Apache Shiro☆4,382Updated last week
- Apache Lucene and Solr open-source search software☆4,374Updated 9 months ago
- Do not send pull requests! Automated Git clone of various OpenJDK branches☆2,161Updated 4 years ago
- Mirror of Apache Mahout☆2,169Updated this week
- Apache ZooKeeper☆12,519Updated this week
- Enterprise Stream Process Engine☆3,892Updated 2 years ago
- Apache Tomcat☆7,860Updated this week
- The reliable, generic, fast and flexible logging framework for Java.☆3,116Updated 2 months ago
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,988Updated 7 months ago
- When jsoup meets XPath.☆469Updated last year
- A configurable web spider with a easy-to-use web console☆998Updated 6 years ago
- Ribbon is a Inter Process Communication (remote procedure calls) library with built in software load balancers. The primary usage model i…☆4,605Updated last week
- Mirror of Apache HttpClient☆1,489Updated this week
- Apache Commons Lang☆2,807Updated this week
- BTrace - a safe, dynamic tracing tool for the Java platform☆5,909Updated 3 weeks ago
- Redis Java client☆12,091Updated last week
- Provides support to increase developer productivity in Java when using Redis, a key-value store. Uses familiar Spring concepts such as a …☆1,798Updated 2 weeks ago
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,555Updated 8 months ago