基于simhash的文本去重算法
☆20Jun 18, 2021Updated 4 years ago
Alternatives and similar repositories for DuplicateRemove
Users that are interested in DuplicateRemove are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 基于谷歌大规模网页去重simhash算法,对海量文章(长文本)进行去重。☆11Dec 8, 2022Updated 3 years ago
- 使用Simhash对海量文本进行去重☆12Jun 2, 2018Updated 7 years ago
- 基于Pytorch实现的中文文本分类脚手架,以及常用模型对比。☆18Apr 23, 2021Updated 4 years ago
- this repo is mnbvc text quality classification using fastText☆16Oct 2, 2023Updated 2 years ago
- A concise implementation of SimCSE☆16Aug 2, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Part-of-speech tagging using BERT☆10Nov 14, 2019Updated 6 years ago
- Collapsed Gibbs sampling for Latent Dirichlet Allocation☆18Jun 11, 2012Updated 13 years ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆10Apr 18, 2023Updated 2 years ago
- 离线版中文标注工具,支持NER、文本分类、关系标注、对话标注等。☆14Jul 29, 2022Updated 3 years ago
- 基于SG2300X的视频检索【使用自然语言搜索视频内容,定位到符合描述的具体时间段】☆13Feb 29, 2024Updated 2 years ago
- Propaganda detection using fine-tuned BERT☆20Jul 21, 2022Updated 3 years ago
- Springboot + ElasticSearch 构建博客检索系统☆12Mar 5, 2020Updated 6 years ago
- 实现功能:新输入一段文本,与已有数据进行相似度进行比较,返回TOP10的文本。主要实现方法:jieba中文分词、gensim、TF-IDF词汇重要性、cosine余弦相似度。☆11Jul 30, 2020Updated 5 years ago
- ☆12May 3, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 本项目包含几种常用 NLP算法的实现:关键词(keyword)、命名实体(named entity)、自动摘要(abstract)、文本相似度比较(text similarity)等☆16Jan 16, 2022Updated 4 years ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- 海量中文文本快速查重☆18Dec 16, 2018Updated 7 years ago
- 一个基于elasticsearch开发的搜索引擎网站☆14Nov 22, 2022Updated 3 years ago
- WordBias: Visualizing Intersectional Social biases encoded in Word Embeddings☆23Aug 18, 2025Updated 7 months ago
- 计算机相关知识笔记☆10Updated this week
- init☆22May 31, 2019Updated 6 years ago
- ☆13Apr 16, 2022Updated 3 years ago
- Production First and Production Ready End-to-End Keyword Spotting Toolkit☆12May 30, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- chat log tool, easily use your own chat data. 聊天记录工具,轻松使用自己的聊天数据☆31Oct 20, 2025Updated 5 months ago
- ☆10Aug 21, 2021Updated 4 years ago
- 使用adb和python等实现电脑端对微信的控制,实现加好友的功能☆14Oct 29, 2020Updated 5 years ago
- a project about Personalization recommendation(UserCF,itemCF,LFM,Personal Rank)☆18Sep 20, 2020Updated 5 years ago
- 批量下载抖音用户视频☆20Jan 19, 2024Updated 2 years ago
- https://www.kaggle.com/c/nbme-score-clinical-patient-notes☆10Sep 1, 2022Updated 3 years ago
- ☆21Jan 9, 2023Updated 3 years ago
- 长文本相似度模型☆21Nov 24, 2023Updated 2 years ago
- 毕业设计:《基于CLIP模型的视频文本检索设计与实现》☆18Jul 21, 2024Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- 微信hook☆11Jan 7, 2020Updated 6 years ago
- 一个Android通用svc跟踪以及hook方案——Frida-Seccomp☆19May 14, 2024Updated last year
- 基于Scrapy-Redis框架与Mongodb的分布式爬虫-elasticsearch搜索引擎打造☆18Apr 21, 2020Updated 5 years ago
- 一个基于Python的Windows下的壁纸更换工具☆21Dec 8, 2022Updated 3 years ago
- ☆23Mar 11, 2025Updated last year
- ☆13Jun 19, 2021Updated 4 years ago
- 图神经网络在推荐系统的应用☆13Aug 26, 2021Updated 4 years ago