基于谷歌大规模网页去重simhash算法,对海量文章(长文本)进行去重。
☆11Dec 8, 2022Updated 3 years ago
Alternatives and similar repositories for Simhash-
Users that are interested in Simhash- are comparing it to the libraries listed below
Sorting:
- 基于simhash的文本去重算法☆20Jun 18, 2021Updated 4 years ago
- 使用Simhash对海量文本进行去重☆12Jun 2, 2018Updated 7 years ago
- Tool to find document on the photo and save it to pdf.☆10Jul 16, 2023Updated 2 years ago
- A fast AES encryption/decryption library for data security☆13Aug 10, 2025Updated 6 months ago
- A data query GUI software using PyQt5☆10May 20, 2023Updated 2 years ago
- 数据资产管理☆10Dec 24, 2018Updated 7 years ago
- A sample installer "per machine" for excelDNA addins☆12Nov 8, 2017Updated 8 years ago
- 收集完成的tensorflow实例, 使用图片分类模式训练并使用图片识别,支持控制台模式和B/S模式。☆12Jul 31, 2017Updated 8 years ago
- A vue component of an SQL Editor based on CodeMirror, with a custom auto-completion☆11Jul 29, 2018Updated 7 years ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆10Apr 18, 2023Updated 2 years ago
- Boolean evaluation and digital calculation expression engine for Java☆12Apr 18, 2022Updated 3 years ago
- Database driver for Stetho for inspecting SQLCipher-encrypted databases.☆10Mar 13, 2019Updated 6 years ago
- 简单易用的数据同步导出框架☆11Feb 13, 2026Updated 3 weeks ago
- master-data-management system☆12Jan 7, 2023Updated 3 years ago
- Provide minimum implementation check and connection pool of PEP249.☆12Mar 2, 2023Updated 3 years ago
- ☆10Apr 8, 2018Updated 7 years ago
- 基于SG2300X的视频检索【使用自然语言搜索视频内容,定位到符合描述的具体时间段】☆13Feb 29, 2024Updated 2 years ago
- Open source OKR application☆14Updated this week
- Price Spider is a Python tool to get price & promotion from JD, Tmall, Amazon, BeiBei☆10Jun 14, 2019Updated 6 years ago
- [NeurIPS 2025] Bag of Tricks for Inference-time Computation of LLM Reasoning☆16Sep 20, 2025Updated 5 months ago
- Chrome Extension demo for a tutorial☆12Mar 5, 2023Updated 3 years ago
- 辅助团队蓝牙室内定位项目实现的计步器☆11Jan 10, 2017Updated 9 years ago
- Springboot + ElasticSearch 构建博客检索系统☆12Mar 5, 2020Updated 6 years ago
- 公开的知识图谱探索项目☆14Jul 9, 2020Updated 5 years ago
- GUI for pg_timetable☆15Feb 27, 2026Updated last week
- 本项目包含几种常用 NLP算法的实现:关键词(keyword)、命名实体(named entity)、自动摘要(abstract)、文本相似度比较(text similarity)等☆16Jan 16, 2022Updated 4 years ago
- 实现功能:新输入一段文本,与已有数据进行相似度进行比较,返回TOP10的文本。主要实现方法:jieba中文分词、gensim、TF-IDF词汇重要性、cosine余弦相似度。☆11Jul 30, 2020Updated 5 years ago
- FrescoLoader is a framework which use fresco to load image into android.widget.ImageView.☆14Jun 3, 2019Updated 6 years ago
- ☆12May 3, 2024Updated last year
- The highest performance event loop.☆14Feb 12, 2026Updated 3 weeks ago
- 基于Lucene、TF-IDF、余弦相似度的文本相似度算法☆12Jul 25, 2018Updated 7 years ago
- python3操作企业微信,发送文字、图片、语音、视频、文件,支持命令行方式调用,其他类引用。☆13Apr 4, 2019Updated 6 years ago
- a simple and powerful crontab written in golang with web page management. golang实现的简单便捷的计划任务管理系统, 自带 web 界面,方便的管理多个任务. 支持 秒,分,时,日,月,周☆12Mar 14, 2021Updated 4 years ago
- chat log tool, easily use your own chat data. 聊天记录工具,轻松使用自己的聊天数据☆23Oct 20, 2025Updated 4 months ago
- Capture photos, convert to pdf, (ocr) text recognition with tesseract, share etc (SwiftUI, Combine, Tesseract)☆14Mar 14, 2021Updated 4 years ago
- 雀魂游戏实时 AI 指导 / Majsoul-AI-Assistant-MahjongMaster / MajsouI copilot / 雀魂助手 / 雀魂教练 / 雀魂AI分析☆47Updated this week
- ☆11May 17, 2021Updated 4 years ago
- Quandl Excel Addin for Windows☆13Dec 1, 2021Updated 4 years ago
- autocomplete with redis☆15Dec 5, 2013Updated 12 years ago