项目整体分为scrapy-redis分布式爬虫爬取数据、基于ElasticSearch数据检索和前端界面展示三大模块。做此项目是为了熟悉scrapy-redis的基本流程,以及其背后的原理,同时熟悉ElasticSearch的使用。本项目可以作为一个基于ES存储的简单但是相对全面的全栈开发的Demo。项目中所采用的组件均在win10本地环境搭建(伪分布),旨在演示项目流程。你可以参考该项目,并将其扩展到多个主机上,实现分布式ES以及分布式Scrapy。
☆25Dec 8, 2022Updated 3 years ago
Alternatives and similar repositories for JobNews-ElasticSearch-Scrapy_redis
Users that are interested in JobNews-ElasticSearch-Scrapy_redis are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 基于Scrapy+Elasticsearch+Django搭建的分布式电影搜索☆31Jul 25, 2018Updated 7 years ago
- ElasticSearch+Django+Scrapy搜索引擎☆28Dec 8, 2022Updated 3 years ago
- 基于Scrapy-Redis框架与Mongodb的分布式爬虫-elasticsearch搜索引擎打造☆18Apr 21, 2020Updated 6 years ago
- python搭建搜索引擎☆30May 5, 2022Updated 3 years ago
- 通过django将scrapy爬取存储到mongodb的数据展示到web页面,增删改查等功能☆13Aug 16, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 基于谷歌大规模网页去重simhash算法,对海量文章(长文本)进行去重。☆11Dec 8, 2022Updated 3 years ago
- Scrapy, tianya, 天涯; scrapy django增量抓取天涯莲蓬鬼话全部帖子☆21Mar 20, 2025Updated last year
- elasticsearch7.9 cdh-ext-parcels and single machine multi instance☆10Jul 12, 2021Updated 4 years ago
- 实现功能:新输入一段文本,与已有数据进行相似度进行比较,返回TOP10的文本。主要实现方法:jieba中文分词、gensim、TF-IDF词汇重要性、cosine余弦相似度。☆11Jul 30, 2020Updated 5 years ago
- 史上最全编程语言xmind思维导图总结 包含python/php/flask/爬虫/javascript/mysql/nosql/git☆25Jun 29, 2021Updated 4 years ago
- ☆12May 3, 2024Updated last year
- 本项目包含几种常用 NLP算法的实现:关键词(keyword)、命名实体(named entity)、自动摘要(abstract)、文本相似度比较(text similarity)等☆16Jan 16, 2022Updated 4 years ago
- 餐厅管理系统 - 练习JDBC、MySQL数据库、德鲁伊连接池的使用;用户登录、订座、点餐、结账、人事管理☆12Feb 22, 2022Updated 4 years ago
- 一个基于elasticsearch开发的搜索引擎网站☆14Nov 22, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 静态站 用vue-element-admin框架搭建☆12Dec 4, 2018Updated 7 years ago
- 一个简易的正则表达式引擎!☆10Apr 9, 2017Updated 9 years ago
- some example plots that are maybe useful?☆11Feb 10, 2026Updated 2 months ago
- 批量下载抖音用户视频☆20Jan 19, 2024Updated 2 years ago
- 主要使用python+Scrapy框架去抓取新闻网站☆25Mar 2, 2017Updated 9 years ago
- Pipeline to do headswap using face segmentation and face landmarks☆17Feb 16, 2022Updated 4 years ago
- ☆21Jan 9, 2023Updated 3 years ago
- MySQL storage for OAuth 2.0☆13Mar 13, 2022Updated 4 years ago
- 国家税务总局全国增值税发票查验 平台☆15Dec 16, 2020Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- The DSDT and SSDTs of Lenovo G470 for hackintosh.☆12Dec 23, 2017Updated 8 years ago
- hadoop-3.1.2z在win10上编译的winUtils☆10Jun 24, 2019Updated 6 years ago
- Leveraging IBM DB2’s Federation Capabilities to Perform SQL Analytics on a Sample Blockchain Insurance Application using Hyperledger Fabr…☆12Sep 17, 2025Updated 7 months ago
- UNIX like OS☆17Mar 7, 2020Updated 6 years ago
- Text retrieval database based on simhash similarity search☆26Mar 27, 2023Updated 3 years ago
- 面向证券信息类专业搜索引擎,基于WEB信息挖掘技术的专业搜索引擎设计与实现并着重分析基于特定主题的爬取方法,通过下载Internet上WEB文档,进行过滤、分词、转换等处理工作,并建立索引数据库,最终可由检索器通过用户输入查询关键字,搜索器支持微博客、短信等内容短小而又不规…☆24Dec 3, 2018Updated 7 years ago
- Spark Streaming + kafka + hbase☆15Nov 19, 2018Updated 7 years ago
- 一个正则表达式转化为nfa, dfa 图片的程序.☆26Jul 8, 2020Updated 5 years ago
- My Personal Blog☆13Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- EasyX tutorial for BUAA Soft 2023 Summer☆14Mar 21, 2024Updated 2 years ago
- A scrapy pipeline which send items to Elastic Search server☆98Jan 2, 2018Updated 8 years ago
- HanLP: Han Language Processing , Java version☆30Oct 13, 2020Updated 5 years ago
- Petit jeu de réflexion☆17Mar 7, 2026Updated last month
- 一个基于SSM框架+Layuimini前端模板开发的酒店管理系统☆21May 10, 2021Updated 4 years ago
- React with Redux and Django REST Framework☆13Sep 18, 2016Updated 9 years ago
- A Python client library for the Pdfcrowd HTML to PDF API☆18Dec 10, 2025Updated 4 months ago