apache / nutch
Apache Nutch is an extensible and scalable web crawler
☆3,000Updated this week
Alternatives and similar repositories for nutch:
Users that are interested in nutch are comparing it to the libraries listed below
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆2,945Updated last week
- Open Source Web Crawler for Java☆4,583Updated 3 years ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,074Updated 2 months ago
- A scalable web crawler framework for Java.☆11,528Updated last month
- Apache Lucene and Solr open-source search software☆4,374Updated 6 months ago
- A scalable, mature and versatile web crawler based on Apache Storm☆904Updated last week
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,511Updated last year
- Apache HBase☆5,312Updated this week
- A configurable web spider with a easy-to-use web console☆994Updated 6 years ago
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,547Updated 5 months ago
- Apache ActiveMQ Classic☆2,347Updated last week
- Mirror of Apache Mahout☆2,163Updated last week
- When jsoup meets XPath.☆469Updated last year
- Apache Hive☆5,667Updated this week
- Apache Curator☆3,133Updated this week
- Apache ZooKeeper☆12,427Updated this week
- Azkaban workflow manager.☆4,490Updated 8 months ago
- Apache Kylin☆3,685Updated 2 weeks ago
- Apache log4j1☆873Updated 2 years ago
- Redis Java client☆12,004Updated this week
- Apache Shiro☆4,365Updated this week
- Mirror of Apache HttpClient☆1,487Updated this week
- Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywo…☆923Updated last year
- No longer maintained. Please contact the origional author.☆662Updated 6 years ago
- Spring integration for MyBatis 3☆2,857Updated last week
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆5,979Updated this week
- Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more☆3,902Updated this week
- Do not send pull requests! Automated Git clone of various OpenJDK branches☆2,164Updated 4 years ago
- Apache Tomcat☆7,764Updated this week
- Jodd! Lightweight. Java. Zero dependencies. Use what you like.☆4,062Updated 11 months ago