基于simhash的文本去重算法
☆20Jun 18, 2021Updated 5 years ago
Alternatives and similar repositories for DuplicateRemove
Users that are interested in DuplicateRemove are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 使用Simhash对海量文本进行去重☆12Jun 2, 2018Updated 8 years ago
- 基于Pytorch实现的中文文本分类脚手架,以及常用模型对比。☆18Apr 23, 2021Updated 5 years ago
- this repo is mnbvc text quality classification using fastText☆16Oct 2, 2023Updated 2 years ago
- Part-of-speech tagging using BERT☆10Nov 14, 2019Updated 6 years ago
- GUI for pg_timetable☆15Apr 13, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- LAVIS - A One-stop Library for Language-Vision Intelligence☆10Apr 18, 2023Updated 3 years ago
- 离线版中文标注工具,支持NER、文本分类、关系标注、对话标注等。☆14Jul 29, 2022Updated 3 years ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆11Feb 13, 2024Updated 2 years ago
- 企业微信接收/回复消息sdk☆16Oct 30, 2020Updated 5 years ago
- Springboot + ElasticSearch 构建博客检索系统☆12Mar 5, 2020Updated 6 years ago
- Pitman-Yor processes in python☆26Apr 18, 2014Updated 12 years ago
- a simple and powerful crontab written in golang with web page management. golang实现的简单便捷的计划任务管理系统, 自带 web 界面,方便的管理多个任务. 支持 秒,分,时,日,月,周☆12Mar 14, 2021Updated 5 years ago
- pytorch☆11Jul 29, 2019Updated 6 years ago
- 通用向量搜索服务☆32Mar 21, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 实现功能:新输入一段文本,与已有数据进行相似度进行比较,返回TOP10的文本。主要实现方法:jieba中文分词、gensim、TF-IDF词汇重要性、cosine余弦相似度。☆11Jul 30, 2020Updated 5 years ago
- ☆12May 3, 2024Updated 2 years ago
- 本项目包含几种常用 NLP算法的实现:关键词(keyword)、命名实体(named entity)、自动摘要(abstract)、文本相似度比较(text similarity)等☆16Jan 16, 2022Updated 4 years ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- Performing Latent Semantic Analysis with Python on large datasets.☆13Jun 21, 2022Updated 4 years ago
- 海量中文文本快速查重☆18Dec 16, 2018Updated 7 years ago
- 一款基于python opencv 4.0开发的美颜程序。用以学习图像处理☆10Dec 14, 2019Updated 6 years ago
- 一个基于elasticsearch开发的搜索引擎网站☆14Nov 22, 2022Updated 3 years ago
- 【Demo】对新闻标题使用TF-IDF向量化和cosine相似度计算完成相似标题推荐☆14Mar 2, 2020Updated 6 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- WordBias: Visualizing Intersectional Social biases encoded in Word Embeddings☆23Aug 18, 2025Updated 10 months ago
- init☆21May 31, 2019Updated 7 years ago
- ☆10Aug 21, 2021Updated 4 years ago
- 使用adb和python等实现电脑端对微信的控制,实现加好友的功能☆14Oct 29, 2020Updated 5 years ago
- Using Seq2Seq transformers for Text2SQL task on WikiSQL dataset.☆12Jan 8, 2022Updated 4 years ago
- Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…☆31Oct 10, 2025Updated 8 months ago
- a project about Personalization recommendation(UserCF,itemCF,LFM,Personal Rank)☆18Sep 20, 2020Updated 5 years ago
- SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex☆19Nov 18, 2022Updated 3 years ago
- https://www.kaggle.com/c/nbme-score-clinical-patient-notes☆10Sep 1, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆12Mar 14, 2023Updated 3 years ago
- Change FeliCa PMm of HCE-F for Android NFC☆11Apr 30, 2023Updated 3 years ago
- 长文本相似度模型☆21Nov 24, 2023Updated 2 years ago
- 毕业设计:《基于CLIP模型的视频文本检索设计与实现》☆18Jul 21, 2024Updated last year
- 微信hook☆11Jan 7, 2020Updated 6 years ago
- 基于Scrapy-Redis框架与Mongodb的分布式爬虫-elasticsearch搜索引擎打造☆18Apr 21, 2020Updated 6 years ago
- ☆22Mar 11, 2025Updated last year