Apache Nutch is an extensible and scalable web crawler
☆3,155May 25, 2026Updated last week
Alternatives and similar repositories for nutch
Users that are interested in nutch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Open Source Web Crawler for Java☆4,622Nov 4, 2021Updated 4 years ago
- A scalable web crawler framework for Java.☆11,677Dec 20, 2025Updated 5 months ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,094Feb 10, 2026Updated 3 months ago
- nutcher是中文的nutch文档,包含nutch的配置和源码解析,持续更新中。☆130Jul 23, 2019Updated 6 years ago
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,513Jan 23, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Apache Lucene and Solr open-source search software☆4,361May 15, 2026Updated 2 weeks ago
- The java implementation of Apache Dubbo. An RPC and microservice framework.☆41,513Updated this week
- Apache Hadoop☆15,554Updated this week
- 阿里云计算平台DataWorks(https://help.aliyun.com/document_detail/137663.html) 团队出品,为监控而生的数据库连接池☆28,186May 12, 2026Updated 3 weeks ago
- Apache ZooKeeper☆12,762May 19, 2026Updated 2 weeks ago
- Google core libraries for Java☆51,487May 23, 2026Updated last week
- jsoup: the Java HTML parser, built for HTML editing, cleaning, scraping, and XSS safety.☆11,366Updated this week
- Netty project - an event-driven asynchronous network application framework☆34,965Updated this week
- Apache HBase☆5,534May 26, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Apache Storm☆6,687May 24, 2026Updated last week
- MyBatis SQL mapper framework for Java☆20,414May 24, 2026Updated last week
- Free and Open Source, Distributed, RESTful Search Engine☆76,759Updated this week
- Scrapy, a fast high-level web crawling & scraping framework for Python.☆61,963May 20, 2026Updated 2 weeks ago
- Spring Framework☆60,002Updated this week
- FASTJSON 2.0.x has been released, faster and more secure, recommend you upgrade.☆25,646Jul 16, 2024Updated last year
- Anthelion is a plugin for Apache Nutch to crawl semantic annotations within HTML pages.☆2,834Dec 17, 2015Updated 10 years ago
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,560May 26, 2026Updated last week
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,995Nov 25, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.☆20,726May 26, 2026Updated last week
- 基于Apache Nutch和Htmlunit的扩展实现AJAX页面爬虫抓取解析插件☆125May 5, 2015Updated 11 years ago
- Apache Kafka - A distributed event streaming platform☆32,662Updated this week
- Open source transactional distributed database. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructu…☆9,752Updated this week
- The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).☆3,782Updated this week
- Apache Hive☆5,969Updated this week
- Redisson: the high-level Java client for Redis and Valkey. Sync/Async/RxJava/Reactive API. Over 50 Valkey and Redis based Java objects an…☆24,345Updated this week
- Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and …☆14,230May 24, 2026Updated last week
- Apache Tomcat☆8,174Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Apache Shiro is a powerful and easy-to-use Java security framework that performs authentication, authorization, cryptography, and session…☆4,434Updated this week
- Apache Spark - A unified analytics engine for large-scale data processing☆43,364Updated this week
- Apache Druid: a high performance real-time analytics database.☆14,010Updated this week
- Redis Java client☆12,328Updated this week
- Apache Flink☆26,032May 26, 2026Updated last week
- Apache ActiveMQ☆2,431Updated this week
- Spring Boot helps you to create Spring-powered, production-grade applications and services with absolute minimum fuss.☆80,724Updated this week