Python脚本实现千万级文本数据快速去重
☆19Mar 14, 2016Updated 10 years ago
Alternatives and similar repositories for PythonTo-repeat-the-text-Bigdata
Users that are interested in PythonTo-repeat-the-text-Bigdata are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆11Aug 17, 2013Updated 12 years ago
- 平时记录的一些Python常用脚本☆25Aug 7, 2019Updated 6 years ago
- 微服务的网关,包含oauth2授权、调用次数限制和服务路由☆13Jan 12, 2017Updated 9 years ago
- 基于springboot 的swagger2动态接口文档在线生成,集成导出html/markdown/confluence 等静态文档 。 及接口操作AOP日志自动记录☆11Aug 26, 2024Updated last year
- 文本去重算法,研究自推荐系统中新闻的去重,采用了雅虎的Near-duplicates and shingling算法,服务端用c实现,客户端用java实现,利用thrift框架进行通信,为了提高扩展性,去重可以在服务端实现,服务器也提供了计算的接口,方便客户端自己扩展☆24Feb 25, 2014Updated 12 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- 一个强大的工具,基于 Postman 接口自动化场景设计☆10May 30, 2026Updated last week
- 这是一个收集脚本的项目☆10Jul 28, 2016Updated 9 years ago
- 用于管理服务器上所有Sqlite数据库的Web应用,实现功能类似于Sqlite控制台。☆50Nov 30, 2019Updated 6 years ago
- 阿里云 oss 的spring boot自动化配置☆14Dec 23, 2016Updated 9 years ago
- electron 桌面应用,支持macos,Windows系统,无广告,清爽,影视剧搜索神器☆11Jul 22, 2020Updated 5 years ago
- Oracle Berkeley DB sourcecode☆12May 5, 2014Updated 12 years ago
- Easy AT Command Terminal☆13Aug 19, 2020Updated 5 years ago
- 规则引擎☆22Feb 28, 2018Updated 8 years ago
- ☆51Nov 8, 2025Updated 7 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- 针对小爱音箱的hack☆17Sep 4, 2024Updated last year
- A structured AI skill for Chinese official document writing. 基于GB/T 9704-2012公文写作skills☆91Jan 26, 2026Updated 4 months ago
- LZSS library for CPython☆13May 8, 2024Updated 2 years ago
- Java二次封装阿里OSS对象存储☆12Oct 19, 2018Updated 7 years ago
- ☆17May 28, 2026Updated 2 weeks ago
- pip install pysnooper_click_able 神级别黑科技装饰器,实现难度5颗星。不用打断点不用到处加print的deubg工具,可以精确显示代码运行率轨迹并点击。base pysnooper, but can click and jump to c…☆22Nov 18, 2021Updated 4 years ago
- 测试工程师的自动化测试工具箱☆16Mar 7, 2026Updated 3 months ago
- 使用golang实现websocket通讯,单机可以支持百万连接,使用gin框架、nginx负载、可以水平部署、程序内部相互通讯、使用grpc通讯协议。☆10Aug 16, 2019Updated 6 years ago
- ZEGO GoClass 是一款基于 ZEGO 音视频互动服务、即构互动白板服务(ZegoWhiteboard)以及 ZEGO 云端录制服务, 根据在线教育行业通用场景及需求研发出来的一套可供教育机构直接使用并开展运营的场景方案。☆10Aug 4, 2022Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- 利用cython将整个python工程所有脚本打包成一个so并编译成whl包,用于python工程部署和代码加密☆14Jul 6, 2021Updated 4 years ago
- 自动生成数据库设计文档☆18May 26, 2023Updated 3 years ago
- 设计页面时提供灵感☆11Apr 21, 2019Updated 7 years ago
- 使用微信控制HomeAssistant☆32May 14, 2025Updated last year
- 利用python的Image库对图片进行无损压缩☆20May 21, 2019Updated 7 years ago
- ☆11Updated this week
- 一款截图翻译小工具,自带截屏功能,图像识别,翻译功能调用百度接口!☆14Apr 16, 2026Updated last month
- PHP非阻塞并发HTTP请求类(采集爬虫专用)☆12Sep 10, 2025Updated 9 months ago
- 用触动精灵lua脚本刷各种广告,由后台下发个广告SDK任务,支持23种分辨率,支持水军,留存上报,留存可后台设置百分比,10多个平台广告同时刷,实时上报刷量数据 本脚本建议配合盘古后台系统,fakeapk hook 工具使用。☆13Mar 30, 2018Updated 8 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆20May 6, 2018Updated 8 years ago
- SU for Windows☆27Apr 8, 2026Updated 2 months ago
- 自用脚本 欢迎star☆22Apr 13, 2021Updated 5 years ago
- 轻量级、易拓展的数据库智能填充开源库(Python实现版)☆14Mar 12, 2019Updated 7 years ago
- 利用图像识别帮助买不起答题机的学校解决手动改选择题的困扰☆16Jul 2, 2020Updated 5 years ago
- 日常维护的脚本☆19Apr 26, 2018Updated 8 years ago
- 计算机网络/嗅探器/抓包☆11Oct 2, 2016Updated 9 years ago