基于Scrapy-Redis框架与Mongodb的分布式爬虫-elasticsearch搜索引擎打造
☆18Apr 21, 2020Updated 6 years ago
Alternatives and similar repositories for Scrapy_spider
Users that are interested in Scrapy_spider are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Crawling zhihu, jobbole, lagou by Scrapy, and using Elasticsearch+Django to build a Search Engine website --- README_zh.md (including: i…☆40Aug 23, 2018Updated 7 years ago
- 项目整体分为scrapy-redis分布式爬虫爬取数据、基于ElasticSearch数据检索和前端界面展示三大模块。做此项目是为了熟悉scrapy-redis的基本流程,以及其背后的原理,同时熟悉ElasticSearch的使用。本项目可以作为一个基于ES存储的简单但是相…☆24Dec 8, 2022Updated 3 years ago
- 基于谷歌大规模网页去重simhash算法,对海量文章(长文本)进行去重。☆11Dec 8, 2022Updated 3 years ago
- Springboot + ElasticSearch 构建博客检索系统☆12Mar 5, 2020Updated 6 years ago
- elasticsearch7.9 cdh-ext-parcels and single machine multi instance☆10Jul 12, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 实现功能:新输入一段文本,与已有数据进行相似度进行比较,返回TOP10的文本。主要实现方法:jieba中文分词、gensim、TF-IDF词汇重要性、cosine余弦相似度。☆11Jul 30, 2020Updated 5 years ago
- ☆12May 3, 2024Updated 2 years ago
- 本项目包含几种常用 NLP算法的实现:关键词(keyword)、命名实体(named entity)、自动摘要(abstract)、文本相似度比较(text similarity)等☆16Jan 16, 2022Updated 4 years ago
- 基于Python3实现的js加密反爬,验证码破解,字体加密反爬等其他类型反爬虫的破解☆15Jun 9, 2023Updated 3 years ago
- Performing Latent Semantic Analysis with Python on large datasets.☆13Jun 21, 2022Updated 3 years ago
- 一个基于elasticsearch开发的搜索引擎网站☆14Nov 22, 2022Updated 3 years ago
- 基于simhash的文本去重算法☆20Jun 18, 2021Updated 5 years ago
- pinduoduo_spider☆22Feb 28, 2019Updated 7 years ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 静态站 用vue-element-admin框架搭建☆12Dec 4, 2018Updated 7 years ago
- 批量下载抖音用户视频☆21Jan 19, 2024Updated 2 years ago
- ☆21Jan 9, 2023Updated 3 years ago
- 长文本相似度模型☆21Nov 24, 2023Updated 2 years ago
- 毕业设计:《基于CLIP模型的视频文本检索设计与实现》☆18Jul 21, 2024Updated last year
- Data-enriching GAN for retrieving Representative Samples from aTrained Classifier☆14Sep 2, 2020Updated 5 years ago
- Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)☆20Oct 22, 2024Updated last year
- Text retrieval database based on simhash similarity search☆26Mar 27, 2023Updated 3 years ago
- ☆23Apr 10, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A small, educational autograd system with deep neural networks support☆13Apr 29, 2023Updated 3 years ago
- demo natural language video db using CLIP☆28Aug 7, 2024Updated last year
- A list of interesting payloads, tips and tricks for bug bounty hunters.☆24Sep 1, 2019Updated 6 years ago
- 手把手教你ShardingSphere入门☆15Nov 13, 2020Updated 5 years ago
- 一个基于SSM框架+Layuimini前端模板开发的酒店管理系统☆21May 10, 2021Updated 5 years ago
- ☆17Nov 15, 2021Updated 4 years ago
- Source code for our AAAI 2020 paper P-SIF: Document Embeddings using Partition Averaging☆35May 2, 2020Updated 6 years ago
- 支持多服务端的Frp Openwrt插件☆20Mar 6, 2024Updated 2 years ago
- TF-IDF+Word2vec做文本相似度计算,最好是长文本☆24Dec 18, 2019Updated 6 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Generalizable Implicit Hate Speech Detection using Contrastive Learning (COLING 2022)☆14Oct 9, 2022Updated 3 years ago
- HTML5 rich text editor. Try the demo integration at☆20Jun 19, 2019Updated 7 years ago
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆33Jun 18, 2025Updated last year
- 补环境框架sdenv的拓展包,用于浏览器端与node端代码共用☆46Dec 22, 2025Updated 5 months ago
- 用于深度学习领域图片识别项目的验证码样本数据生成器☆35May 22, 2018Updated 8 years ago
- 大数据组件学习;包括dataflow,spring cloud stream;elasticsearch;flink;spark;kafka;phoenix;Hive;Hbase;☆22Jul 1, 2022Updated 3 years ago
- Face Recognition using Deep Learning and TensorFlow Framework☆10Jul 19, 2017Updated 8 years ago