duoan/codes-scratch-crawler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/duoan/codes-scratch-crawler)

duoan / codes-scratch-crawler

读书笔记《自己动手写网络爬虫》，自己敲的代码。主要记录了网络爬虫的基本实现，网页去重的算法，网页指纹算法，文本信息挖掘

☆47

Alternatives and similar repositories for codes-scratch-crawler

Users that are interested in codes-scratch-crawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yangguang2014 / distributedCrawler
View on GitHub
华南理工大学高英实验室进行的分布式爬虫项目,除了实验室内部人员外,不得私自传播.
☆21Jul 13, 2014Updated 12 years ago
ysc / HtmlExtractor
View on GitHub
HtmlExtractor是一个Java实现的基于模板的网页结构化信息精准抽取组件。
☆154Aug 27, 2018Updated 7 years ago
sreejithpillai / ExcelRecordReaderMapReduce
View on GitHub
MapReduce InputFormat that can read Excel files
☆14Jul 25, 2015Updated 11 years ago
ml-distribution / phrase-finding
View on GitHub
新词发现分布式机器学习算法。
☆15Jul 21, 2014Updated 12 years ago
ml-distribution / negative-sentiment
View on GitHub
基于词典的负面舆情信息评分算法。
☆26Dec 16, 2014Updated 11 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
redreamality / learning-to-rank
View on GitHub
python LTR framework
☆15Nov 22, 2015Updated 10 years ago
doubleview / fastcrawler
View on GitHub
一个快速，简单，基于多线程的网络爬虫框架
☆13Mar 3, 2017Updated 9 years ago
dianping / storm-util
View on GitHub
☆18Apr 23, 2015Updated 11 years ago
anjuke / romar
View on GitHub
Recommendation Web Service
☆17Apr 17, 2013Updated 13 years ago
19rick96 / QA
View on GitHub
Deep Learning for Question Answering
☆21Jul 10, 2016Updated 10 years ago
duoan / codes-scratch-akka
View on GitHub
akka学习理解，使用了maven、sbt两种构建方式，同时使用量java和scala两种语言实现。akka入门，清晰理解akka流程
☆13Oct 18, 2015Updated 10 years ago
medcl / elasticsearch-analysis-paoding
View on GitHub
Paoding Analysis Plugin for ElasticSearch
☆21May 14, 2013Updated 13 years ago
PacktPublishing / Mastering-Machine-Learning-with-Spark-2.x
View on GitHub
Mastering Machine Learning with Spark 2.x, published by Packt
☆43Jan 30, 2023Updated 3 years ago
blockfrost / blockfrost-java
View on GitHub
Java SDK for the Blockfrost.io API.
☆12Oct 22, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
15810856129 / Simhash
View on GitHub
使用Simhash对海量文本进行去重
☆12Jun 2, 2018Updated 8 years ago
yxlHuster / news-duplicated
View on GitHub
文本去重算法，研究自推荐系统中新闻的去重，采用了雅虎的Near-duplicates and shingling算法，服务端用c实现，客户端用java实现，利用thrift框架进行通信，为了提高扩展性，去重可以在服务端实现，服务器也提供了计算的接口，方便客户端自己扩展
☆24Feb 25, 2014Updated 12 years ago
licheng-xd / ip-locator
View on GitHub
用于IP定位，包含了纯真IP库，和自己爬出来的本地文本库两种方法。
☆13Jan 21, 2015Updated 11 years ago
Ivanzgj / HealthCare
View on GitHub
毕设：一个血压实时监控app
☆17May 29, 2016Updated 10 years ago
spring-attic / spring-sync-samples
View on GitHub
Samples demonstrating the use of Spring Sync
☆24Nov 4, 2014Updated 11 years ago
mlabs-haskell / the-plutus-scaffold
View on GitHub
Up-to-data version of plutus-scaffold. It's a fuller example utilizing ctl, see the overview in README. This project contains the build s…
☆10May 23, 2023Updated 3 years ago
Jueee / PythonWebCrawlers
View on GitHub
Python 网络爬虫（Web Crawlers）学习笔记。
☆32Aug 11, 2020Updated 5 years ago
input-output-hk / adawallet
View on GitHub
A single address wallet that supports mnemonics and hardware wallets
☆11Dec 26, 2025Updated 6 months ago
1m188 / mansys
View on GitHub
信息管理系统
☆16Feb 21, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
zywaited / get_wx_article
View on GitHub
python多进程、多线程抓取网页清博大数据微信公众号文章信息
☆11Jun 25, 2016Updated 10 years ago
danwenxuan / poem_generator
View on GitHub
使用唐诗语料库，经过去噪预处理、分词、生成搭配、生成主题等过程，生成唐诗。基于Python
☆15Aug 14, 2017Updated 8 years ago
CardanoSolutions / ogmios-ts-client-starter-kit
View on GitHub
A Demeter starter kit that shows how to use Ogmios' Typescript client
☆13Jan 3, 2024Updated 2 years ago
mlabs-haskell / cardano-dev-wallet
View on GitHub
A browser-based Cardano wallet for developers & testers
☆13Jun 2, 2025Updated last year
gsh199449 / DistributeCrawler
View on GitHub
基于Map/Reduce爬虫,可抽取各大新闻网站的新闻正文并进行分类和聚类
☆73Jan 5, 2014Updated 12 years ago
reallin / Bayes
View on GitHub
用java实现的贝叶斯分类算法。用于大数据的分类。
☆42Nov 24, 2015Updated 10 years ago
Gyoliu / phoenix-hbase
View on GitHub
phoenix 操作hbase和springboot的整合
☆11Dec 7, 2017Updated 8 years ago
gcase / spring-data-rest-datatable-example
View on GitHub
☆29Aug 29, 2012Updated 13 years ago
Json-Lin / spring-boot-practice
View on GitHub
some spring-boot sample
☆11Sep 1, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
karthikdivi / spring-react-material-boilerplate
View on GitHub
Spring Boot + ReactJS + MaterialUI boilerplate
☆14Nov 21, 2018Updated 7 years ago
bloxbean / cardano-client-examples
View on GitHub
Examples - How to use Cardano-client-lib
☆16Mar 2, 2026Updated 4 months ago
August1s / Relationships-Find-by-Python
View on GitHub
基于共现来统计小说《人名的名义》中的人物关系
☆12Apr 22, 2018Updated 8 years ago
nidhinek / android-tooltip
View on GitHub
Create tooltip to show information about widget like button.Many apps required coach mark to display at the screen to guide users on its …
☆10Mar 10, 2017Updated 9 years ago
sky-xsk / elem
View on GitHub
vue2.0 适合新手的外卖小项目，不断更新中。。。
☆14May 21, 2019Updated 7 years ago
RayHuangCN / yatos
View on GitHub
Yet another tiny OS
☆17Jul 28, 2017Updated 8 years ago
linger2012 / recommendation-algorithm-implemented-by-java
View on GitHub
推荐算法
☆30Jun 5, 2015Updated 11 years ago