apache / nutchLinks
Apache Nutch is an extensible and scalable web crawler
☆3,029Updated 2 months ago
Alternatives and similar repositories for nutch
Users that are interested in nutch are comparing it to the libraries listed below
Sorting:
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆2,977Updated this week
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,073Updated 4 months ago
- Open Source Web Crawler for Java☆4,594Updated 3 years ago
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,515Updated last year
- A scalable, mature and versatile web crawler based on Apache Storm☆914Updated this week
- A scalable web crawler framework for Java.☆11,566Updated 3 weeks ago
- Apache HBase☆5,347Updated this week
- Apache Storm☆6,632Updated this week
- A configurable web spider with a easy-to-use web console☆994Updated 6 years ago
- Ehcache 3.x line☆2,050Updated 2 weeks ago
- Enterprise Stream Process Engine☆3,893Updated last year
- Apache Lucene and Solr open-source search software☆4,376Updated 8 months ago
- a mature, highly concurrent JDBC Connection pooling library, with support for caching and reuse of PreparedStatements.☆1,302Updated last week
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,987Updated 6 months ago
- Apache ZooKeeper☆12,503Updated this week
- Apache Tomcat☆7,840Updated last week
- Mirror of Apache HttpClient☆1,489Updated this week
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,552Updated 7 months ago
- Apache Shiro☆4,380Updated this week
- Apache ActiveMQ Classic☆2,359Updated last week
- Mirror of Apache Mahout☆2,167Updated 2 weeks ago
- Apache log4j1☆869Updated 2 years ago
- Apache Curator☆3,140Updated last month
- Hibernate's core Object/Relational Mapping functionality☆6,165Updated this week
- Code for Quartz Scheduler☆6,516Updated last month
- Apache Struts is a free, open-source, MVC framework for creating elegant, modern Java web applications☆1,313Updated this week
- The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).☆3,022Updated this week
- Apache Hive☆5,713Updated this week
- Benchmark comparing serialization libraries on the JVM☆3,292Updated last year
- Jodd! Lightweight. Java. Zero dependencies. Use what you like.☆4,061Updated last year