Apache Nutch is an extensible and scalable web crawler
☆3,135Updated this week
Alternatives and similar repositories for nutch
Users that are interested in nutch are comparing it to the libraries listed below
Sorting:
- Open Source Web Crawler for Java☆4,627Nov 4, 2021Updated 4 years ago
- A scalable web crawler framework for Java.☆11,703Dec 20, 2025Updated 2 months ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,093Feb 10, 2026Updated 2 weeks ago
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,514Jan 23, 2026Updated last month
- nutcher是中文的nutch文档,包含nutch的配置和源码解析,持续更新中。☆130Jul 23, 2019Updated 6 years ago
- Apache Lucene and Solr open-source search software☆4,369Sep 25, 2024Updated last year
- The java implementation of Apache Dubbo. An RPC and microservice framework.☆41,738Feb 20, 2026Updated last week
- Apache Hadoop☆15,487Updated this week
- 阿里云计算平台DataWorks(https://help.aliyun.com/document_detail/137663.html) 团队出品,为监控而生的数据库连接池☆28,218Updated this week
- Google core libraries for Java☆51,479Updated this week
- Apache Storm☆6,671Feb 4, 2026Updated 3 weeks ago
- Apache ZooKeeper☆12,731Feb 19, 2026Updated last week
- Netty project - an event-driven asynchronous network application framework☆34,811Updated this week
- Apache HBase☆5,588Updated this week
- jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.☆11,341Feb 10, 2026Updated 2 weeks ago
- MyBatis SQL mapper framework for Java☆20,387Feb 21, 2026Updated last week
- Free and Open Source, Distributed, RESTful Search Engine☆76,165Feb 21, 2026Updated last week
- Apache Cassandra®☆9,644Updated this week
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,559Oct 10, 2024Updated last year
- Scrapy, a fast high-level web crawling & scraping framework for Python.☆59,832Feb 20, 2026Updated last week
- Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.☆20,685Updated this week
- Apache Shiro☆4,438Updated this week
- Apache Hive☆6,002Updated this week
- Mirror of Apache Kafka☆32,065Updated this week
- The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).☆3,582Updated this week
- FASTJSON 2.0.x has been released, faster and more secure, recommend you upgrade.☆25,717Jul 16, 2024Updated last year
- Apache Spark - A unified analytics engine for large-scale data processing☆42,898Updated this week
- Redisson - Valkey & Redis Java client. Real-Time Data Platform. Sync/Async/RxJava/Reactive API. Over 50 Valkey and Redis based Java objec…☆24,255Feb 20, 2026Updated last week
- Apache Flink☆25,825Updated this week
- Spring Framework☆59,649Updated this week
- Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and …☆14,205Updated this week
- Apache Druid: a high performance real-time analytics database.☆13,942Updated this week
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,994Nov 25, 2024Updated last year
- Redis Java client☆12,280Updated this week
- Distributed scheduled job☆8,221Feb 7, 2026Updated 2 weeks ago
- Apache Kylin☆3,766Dec 29, 2025Updated 2 months ago
- Apache Mahout - an environment for quickly creating scalable, performant machine learning applications.☆2,208Updated this week
- Spring Boot helps you to create Spring-powered, production-grade applications and services with absolute minimum fuss.☆80,078Updated this week
- APM, Application Performance Monitoring System☆24,708Updated this week