使用Simhash对海量文本进行去重
☆12Jun 2, 2018Updated 7 years ago
Alternatives and similar repositories for Simhash
Users that are interested in Simhash are comparing it to the libraries listed below
Sorting:
- 基于simhash的文本去重算法☆20Jun 18, 2021Updated 4 years ago
- 基于谷歌大规模网页去重simhash算法,对海量文章(长文本)进行去重。☆11Dec 8, 2022Updated 3 years ago
- 企业事件抽取☆13May 20, 2021Updated 4 years ago
- simhash算法实现海量内容查重☆14Apr 23, 2016Updated 9 years ago
- some articles from gitchat VIP☆14Dec 19, 2021Updated 4 years ago
- MBC: Memory Bank Compression for Continual Adaptation of Large Language Models☆20Sep 22, 2025Updated 5 months ago
- 自定义层次分类和标签进行个人知识管理☆10Apr 2, 2019Updated 6 years ago
- Paper: A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging☆35Mar 7, 2019Updated 6 years ago
- [WWW '24] UnifiedSSR: A Unified Framework of Sequential Search and Recommendation☆12Feb 16, 2024Updated 2 years ago
- pytorch实现聊天机器人,seq2seq模型☆10Feb 9, 2020Updated 6 years ago
- 驾校在线考试模拟系统桌面端。科目一、科目四支持语音播报、错题解答等功能,技术栈:一次开发多端适配,web端,可生成desktop安装包,主要使用lectron-builder+vue全家桶以及element-ui☆14Aug 5, 2020Updated 5 years ago
- ☆10Feb 23, 2021Updated 5 years ago
- 2018云移杯景区口碑评价分值预测 7/1186☆11Jul 16, 2018Updated 7 years ago
- Code release for "Learning from Missing Relations: Contrastive Learning with Commonsense Knowledge Graphs for Commonsense Inference"☆10Jun 25, 2022Updated 3 years ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆10Apr 18, 2023Updated 2 years ago
- Code & data for IJCAI'22 paper "RecipeRec: A Heterogeneous Graph Learning Model for Recipe Recommendation".☆16Jul 24, 2022Updated 3 years ago
- Information Extraction related tools and models☆10Mar 16, 2023Updated 2 years ago
- reproduce SimCSE in jupyter-notebook☆10Nov 28, 2021Updated 4 years ago
- 数据库同步方案☆43Sep 25, 2024Updated last year
- online-exam-backend是一个在线 考试系统的后端模块。基于Jersey+Spring实现的的restful服务,主要包括用户管理、在线考试,自动批卷、成绩管理、错题管理、留言板、试卷管理、题库管理、试题科目维护等功能。☆11Mar 19, 2021Updated 4 years ago
- 监控系统后台前端demo,使用vue、element-ui、echarts和mqtt☆13Jan 29, 2024Updated 2 years ago
- 使用唐诗语料库,经过去噪预处理、分词、生成搭配、生成主题等过程,生成唐诗。基于Python☆14Aug 14, 2017Updated 8 years ago
- ☆12Apr 12, 2024Updated last year
- Attempt on a Kaggle competition, Personalized Web Search Challenge, hosted by Yandex (http://www.kaggle.com/c/yandex-personalized-web-sea…☆11Jan 3, 2014Updated 12 years ago
- [NeurIPS 2025] Bag of Tricks for Inference-time Computation of LLM Reasoning☆16Sep 20, 2025Updated 5 months ago
- ☆10Jun 28, 2015Updated 10 years ago
- Applied BERT based model to extract relations from 29 annual reports of listed companies and news; Used spaCy library and BERT model for …☆13Feb 2, 2022Updated 4 years ago
- KDD淘宝长尾推荐见https://tianchi.aliyun.com/competition/entrance/231785/information☆11Jul 2, 2020Updated 5 years ago
- My solution for #12 in privat leaderboard. Score=0.0260809843625832☆11Sep 6, 2021Updated 4 years ago
- 中英文语料数据清洗及分布式分句分词预处理工作☆12Mar 28, 2020Updated 5 years ago
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆12Aug 17, 2013Updated 12 years ago
- ZEGO GoClass 是一款基于 ZEGO 音视频互动服务、即构互动白板服务(ZegoWhiteboard)以及 ZEGO 云端录制服务, 根据在线教育行业通用场景及需求研发出来的一套可供教育机构直接使用并开展运营的场景方案。☆10Aug 4, 2022Updated 3 years ago
- lightsmile个人的用于爬取网络公开语料数据的mini通用爬虫框架。☆13Sep 30, 2020Updated 5 years ago
- 《金融中的人工智能》配套代码☆11Sep 20, 2022Updated 3 years ago
- 基于粒子群算法的自动组卷考试系统☆13Jan 5, 2018Updated 8 years ago
- 🕷python3爬虫☆11Jul 4, 2019Updated 6 years ago
- AI Challenger Image Caption Competition☆10Dec 13, 2017Updated 8 years ago
- Chest CT Computer-Aided Detection For Pulmonary Nodules☆11Oct 26, 2014Updated 11 years ago
- 使用graphsage 进行连边预测的实验☆13Mar 29, 2019Updated 6 years ago