zhihu-crawler是一个基于Java的高性能、支持免费http代理池、支持横向扩展、分布式爬虫项目
☆918Apr 2, 2019Updated 6 years ago
Alternatives and similar repositories for zhihu-crawler
Users that are interested in zhihu-crawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 基于 webmagic 的 Java 爬虫应用☆2,782Jan 8, 2022Updated 4 years ago
- 一个简单、敏捷、分布式的支持SpringBoot的Java爬虫框架;An agile, distributed crawler framework.☆1,997Nov 25, 2024Updated last year
- Easy to use lightweight web crawler(易用的轻量化网络爬虫)☆2,515Jan 23, 2026Updated 2 months ago
- 一个基于webmagic框架二次开发的java爬虫框架实战,已实现能爬取腾讯,搜狐,今日头条(单独集成功能)等资讯内容,配合elasticsearch框架用法,实现了自动爬虫,已投入线上生产使用。☆341Nov 16, 2022Updated 3 years ago
- 知乎爬虫/可以爬出关注关系的爬虫☆307Jun 7, 2025Updated 9 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A scalable web crawler framework for Java.☆11,696Dec 20, 2025Updated 3 months ago
- 新浪微博爬虫,采用Java语言开发,基于HTTPClient 4.0,采用MySQL存储爬取数据,支持多进程并发执行。功能包括:爬取微博、评论、转发、关注列表(层次)。根据数据需求,持续更新...☆356Feb 27, 2014Updated 12 years ago
- Java无框架实现爬取知乎用户信息、图片和知乎推荐内容并下载到本地或数据库中☆389Jan 21, 2017Updated 9 years ago
- 知乎爬虫,基于webmagic框架 .A java web spider base on webmagic.☆69May 26, 2016Updated 9 years ago
- 一个基于微博用户数据的Java爬虫项目☆319Aug 18, 2020Updated 5 years ago
- WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup …☆3,093Feb 10, 2026Updated last month
- Open Source Web Crawler for Java☆4,630Nov 4, 2021Updated 4 years ago
- "奇伢爬虫"是基于sprint boot 、 WebMagic 实现 微信公众号文章、新闻、csdn、info等网站文章爬取,可以动态设置文章爬取规则、清洗规则,基本实现了爬取大部分网站的文章。☆323Sep 3, 2017Updated 8 years ago
- A configurable web spider with a easy-to-use web console☆997Aug 21, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- 基于Spring+SpringMVC+Mybatis分布式敏捷开发系统架构,提供整套公共微服务服务模块:集中权限管理(单点登录)、内容管理、支付中心、用户管理(支持第三方登录)、微信平台、存储系统、配置中心、日志分析、任务和通知等,支持服务治理、监控和追踪,努力为中小型企业…☆16,704Dec 16, 2022Updated 3 years ago
- 使用java+httpclient+httpcleaner,多线程、分布式爬去电商网站商品信息,数据存储在hbase上,并使用solr对商品建立索引,使用redis队列存储一个共享的url仓库;使用zookeeper对爬虫节点生命周期进行监视等。☆233Nov 6, 2020Updated 5 years ago
- github: https://github.com/kanwangzjm/funiture, spring项目,权限管理、系统监控、定时任务动态调整、qps限制、sql监控(邮件)、验证码服务、短链接服务、动态配置等☆1,873Nov 15, 2023Updated 2 years ago
- 基于WebMagic写的一个csdn博客小爬虫☆91Jun 7, 2018Updated 7 years ago
- 天气爬虫(全国城镇天气自动定时抓取更新,并开放RESTful查询接口),附带代理IP池定时更新并检测其可用性☆367Jun 25, 2018Updated 7 years ago
- 1、支持网页爬虫 2、多线程、线程池 3、支持全文搜索 4、支持Hadoop分布式平台、HDFS/MapReduce、Zookeeper、HBase 5、支持redis分布式缓存 6、集成微信公众号开发 7、Spring4新特性 8、ActiveMQ 9、Nginx详细配置…☆16Nov 16, 2022Updated 3 years ago
- 拉勾网数据爬虫☆32Sep 22, 2017Updated 8 years ago
- DistributeCrawler的Maven版☆10Jun 20, 2022Updated 3 years ago
- 实现定时爬取与IP代理池☆150Apr 11, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Apache Nutch is an extensible and scalable web crawler☆3,143Feb 27, 2026Updated last month
- 一个简单易用的爬虫框架,内置代理管理模块,灵活设置多线程爬取☆63Feb 23, 2017Updated 9 years ago
- 旨在打造在线最佳的 Java 学习笔记,含博客讲解和源码实例,包括 Java SE 和 Java Web☆4,290Jan 8, 2022Updated 4 years ago
- 给爬虫使用的代理IP池☆568Sep 6, 2019Updated 6 years ago
- 🐝 Web vertical crawler framework for fun☆193Dec 16, 2023Updated 2 years ago
- spring cloud + vue + oAuth2.0全家桶实战,前后端分离模拟商城,完整的购物流程、后端运营平台,可以实现快速搭建企业级微服务项目。支持微信登录等三方登录。☆9,885Oct 9, 2023Updated 2 years ago
- 【大厂面试专栏】一份Java程序员需要的技术指南,这里有面试题、系统架构、职场锦囊、主流中间件等,让你成为更牛的自己!☆14,701Jul 21, 2025Updated 8 months ago
- Spring源码阅读☆13,762Mar 24, 2023Updated 3 years ago
- Storm Kafka 流数据 处理系统☆20Oct 10, 2018Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 爬虫项目源码整理,使用redis进行url缓存,hbase进行详细信息的存储。使用zookeeper进行爬虫线程的状态监控。☆19Oct 7, 2015Updated 10 years ago
- 《Java多线程编程实战指南(设计模式篇)》源码☆663Mar 16, 2020Updated 6 years ago
- A lightweight web crawler framework.(Java爬虫框架)☆756Dec 20, 2025Updated 3 months ago
- NetDiscovery 是一款基于 Vert.x、RxJava 2 等框架实现的通用爬虫框架/中间件。☆646Nov 28, 2020Updated 5 years ago
- 基于hadoop思维的分布式网络爬虫。☆85Mar 8, 2016Updated 10 years ago
- 豆瓣电影爬虫——a crawler which is able to crawl movie detail and short comments, save them to database mysql, also include Sentiment analysis ba…☆69Mar 24, 2019Updated 7 years ago
- 微信开发 Java SDK ,支持包括微信支付,开放平台,小程序,企业微信,视频号,公众号等的后端开发☆32,667Mar 22, 2026Updated last week