apache / nutchLinks
Apache Nutch is an extensible and scalable web crawler
☆3,077Updated 2 weeks ago
Alternatives and similar repositories for nutch
Users that are interested in nutch are comparing it to the libraries listed below
Sorting:
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆3,068Updated this week
- Open Source Web Crawler for Java☆4,604Updated 3 years ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,082Updated last month
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,517Updated 2 months ago
- Mirror of Apache Mahout☆2,175Updated last week
- A scalable, mature and versatile web crawler based on Apache Storm☆933Updated this week
- A scalable web crawler framework for Java.☆11,636Updated last month
- Enterprise Stream Process Engine☆3,891Updated 2 years ago
- Ehcache 3.x line☆2,067Updated last week
- A configurable web spider with a easy-to-use web console☆998Updated 7 years ago
- No longer maintained. Please contact the origional author.☆666Updated 7 years ago
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,557Updated 11 months ago
- Mirror of Apache HttpClient☆1,514Updated last week
- Apache ActiveMQ Classic☆2,402Updated this week
- Apache log4j1☆869Updated 2 years ago
- Apache HBase☆5,400Updated this week
- When jsoup meets XPath.☆470Updated 2 years ago
- Apache Curator☆3,158Updated last week
- a mature, highly concurrent JDBC Connection pooling library, with support for caching and reuse of PreparedStatements.☆1,309Updated last month
- Apache Lucene and Solr open-source search software☆4,372Updated last year
- Elasticsearch Java Rest Client.☆2,115Updated 2 years ago
- Apache OpenNLP☆1,543Updated last week
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,994Updated 10 months ago
- JAVA WEB + ORM Framework☆3,263Updated last month
- Apache Kylin☆3,752Updated last week
- Jsoup学习笔记。添加了部分学习代码和注释。☆637Updated last year
- Apache Storm☆6,656Updated this week
- Eclipse Jetty® - Web Container & Clients - supports HTTP/3, HTTP/2, HTTP/1, websocket, servlets, and more☆4,003Updated this week
- ZooKeeper client wrapper and rich ZooKeeper framework☆2,140Updated 2 years ago
- zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目☆917Updated 6 years ago