使用Simhash对海量文本进行去重
☆12Jun 2, 2018Updated 7 years ago
Alternatives and similar repositories for Simhash
Users that are interested in Simhash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 基于simhash的文本去重算法☆20Jun 18, 2021Updated 4 years ago
- 基于谷歌大规模网页去重simhash算法,对海量文章(长文本)进行去重。☆11Dec 8, 2022Updated 3 years ago
- 企业事件抽取☆13May 20, 2021Updated 4 years ago
- A tool for extracting chunks from Penn Chinese Treebank☆18Jan 12, 2018Updated 8 years ago
- 中英文语料数据清洗及分布式分句分词预处理工作☆12Mar 28, 2020Updated 5 years ago
- Implement some ML algorithms in scala☆21Jul 25, 2023Updated 2 years ago
- this repo is mnbvc text quality classification using fastText☆16Oct 2, 2023Updated 2 years ago
- GUI for pg_timetable☆15Feb 27, 2026Updated 3 weeks ago
- simhash算法实现海量内容查重☆14Apr 23, 2016Updated 9 years ago
- A project of N-gram model comparing FMM/BMM☆17Oct 17, 2022Updated 3 years ago
- 1-day Deep Learning with R workshop at RStudio::conf 2019☆35Jan 22, 2019Updated 7 years ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆10Apr 18, 2023Updated 2 years ago
- weighted indexes for clustering evaluation☆14Apr 18, 2020Updated 5 years ago
- Scrapy + selenium/webdriver + 随机User-Agent + IP proxy + twisted ConnectionPool + mysql 爬取某书整站爬虫☆15Dec 8, 2022Updated 3 years ago
- 2018全国研究生数学建模竞赛(C题:对恐怖袭击事件记录数据的量化分析)☆12Jul 18, 2019Updated 6 years ago
- 企业微信接收/回复消息sdk☆16Oct 30, 2020Updated 5 years ago
- a simple and powerful crontab written in golang with web page management. golang实现的简单便捷的计划任务管理系统, 自带 web 界面,方便的管理多个任务. 支持 秒,分,时,日,月,周☆12Mar 14, 2021Updated 5 years ago
- 使用唐诗语料库,经过去噪预处理、分词、生成搭配、生成主题等过程,生成唐诗。基于Python☆15Aug 14, 2017Updated 8 years ago
- python多进程、多线程抓取网页清博大数据微信公众号文章信息☆11Jun 25, 2016Updated 9 years ago
- python3操作企业微信,发送文字、图片、语音、视频 、文件,支持命令行方式调用,其他类引用。☆13Apr 4, 2019Updated 6 years ago
- [NeurIPS 2025] Bag of Tricks for Inference-time Computation of LLM Reasoning☆16Sep 20, 2025Updated 6 months ago
- Attempt on a Kaggle competition, Personalized Web Search Challenge, hosted by Yandex (http://www.kaggle.com/c/yandex-personalized-web-sea…☆11Jan 3, 2014Updated 12 years ago
- 基于共现来统计小说《人名的名义》中的人物关系☆12Apr 22, 2018Updated 7 years ago
- reproduce SimCSE in jupyter-notebook☆10Nov 28, 2021Updated 4 years ago
- Code release for "Learning from Missing Relations: Contrastive Learning with Commonsense Knowledge Graphs for Commonsense Inference"☆10Jun 25, 2022Updated 3 years ago
- [WWW '24] UnifiedSSR: A Unified Framework of Sequential Search and Recommendation☆12Feb 16, 2024Updated 2 years ago
- Applied BERT based model to extract relations from 29 annual reports of listed companies and news; Used spaCy library and BERT model for …☆13Feb 2, 2022Updated 4 years ago
- Sample use of Amazon Personalize for a recommender system for car searches☆14Jul 17, 2019Updated 6 years ago
- KDD淘宝长尾推荐见https://tianchi.aliyun.com/competition/entrance/231785/information☆11Jul 2, 2020Updated 5 years ago
- ☆22Oct 29, 2019Updated 6 years ago
- Code & data for IJCAI'22 paper "RecipeRec: A Heterogeneous Graph Learning Model for Recipe Recommendation".☆16Jul 24, 2022Updated 3 years ago
- Production First and Production Ready End-to-End Keyword Spotting Toolkit☆12May 30, 2022Updated 3 years ago
- chat log tool, easily use your own chat data. 聊天记录工具,轻松使用自己的聊天数据☆29Oct 20, 2025Updated 5 months ago
- 使用adb和python等实现电脑端对微信的控制,实现加好友的功能☆14Oct 29, 2020Updated 5 years ago
- a project about Personalization recommendation(UserCF,itemCF,LFM,Personal Rank)☆18Sep 20, 2020Updated 5 years ago
- SuperMinHash: A New Minwise Hashing Algorithm for Jaccard Similarity Estimation, Simhash and SimhashIndex☆19Nov 18, 2022Updated 3 years ago
- Source code and data for "Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures…☆19Jul 3, 2024Updated last year
- My solution for #12 in privat leaderboard. Score=0.0260809843625832☆11Sep 6, 2021Updated 4 years ago
- Experiment results using FM, FFM and DeepFM algorithms in Criteo Display Advertising Challenge(https://www.kaggle.com/c/criteo-display-ad…☆13Apr 15, 2020Updated 5 years ago