TXT文本语料数据清洗(Text corpus data cleaning):1> 合并TXT文件;2> 过滤干扰字符串;3> 对人名、地名、组织机构进行遮码处理;4> 将其他编码格式统一转换为UTF-8
☆19Oct 14, 2022Updated 3 years ago
Alternatives and similar repositories for txtfilemerge
Users that are interested in txtfilemerge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。☆36Oct 18, 2022Updated 3 years ago
- 中英文语料数据清洗及分布式分句分词预处理工作☆12Mar 28, 2020Updated 6 years ago
- 云南大学选课爬虫,提供余课提醒服务,实现了自动抢课☆21Jan 19, 2026Updated 4 months ago
- 本项目是一个基于 Java Spring Boot 和 Vue3 的全栈 AI 智能体应用平台,集成了大模型对话、RAG 知识库、智能体自主规划、工具链调用、MCP 服务等多项前沿 AI 技术。平台支持多轮对话、知识检索、自动化任务执行等功能,适用于 AI 应用开发、智能助…☆27Jun 26, 2025Updated 11 months ago
- bumble bee transformer☆14Apr 19, 2021Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 爬取知网页面的文献信息,并存在Excel内☆19Jan 7, 2019Updated 7 years ago
- Demo for DART, Audio Imagination workshop submission in NeurIPS 2024☆15Apr 22, 2026Updated last month
- Dynamic Topic Modelling Tutorial Files☆14May 12, 2015Updated 11 years ago
- 实现功能:新输入一段文本,与已有数据进行相似度进行比较,返回TOP10的文本。主要实现方法:jieba中文分词、gensim、TF-IDF词汇重要性、cosine余弦相似度。☆11Jul 30, 2020Updated 5 years ago
- 利用Bert获取中文字、词向量☆10Jan 18, 2022Updated 4 years ago
- LLM evaluation on 2024 Chinese Gaokao Mathematics — zero-contamination benchmark with dual prompt formats☆19Apr 15, 2026Updated last month
- Latent Drichlet Allocation and Dynamic Topic Modeling☆10Aug 11, 2021Updated 4 years ago
- Tensorflow Implementation of "Theory and Experiments on Vector Quantized Autoencoders"☆15Feb 27, 2019Updated 7 years ago
- Demo for the calculation of the Semantic Brand Score (Basic Version)☆13Sep 1, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 🎯 企业级AI助手规则体系(中文版) - 专为中国开发者打造,支持Augment、Cursor、Claude Code、Trae AI等主流AI工具的一键安装和配置☆28Aug 1, 2025Updated 9 months ago
- Small tutorial on how you can use BERT for Topic Modeling☆18Jun 1, 2021Updated 4 years ago
- A demonstration of how to train a custom tokenizer similar to TikToken.☆15Jan 6, 2025Updated last year
- speech-aligner,是一个从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech an…☆15Dec 19, 2018Updated 7 years ago
- This reposity holds the code for paper Online Academic Course Performance Prediction using Relational Graph Convolutional Neural Network☆11Jul 25, 2024Updated last year
- 将word2vec 训练生成的词向量和BERT生成的词向量进行可视化对比☆15Jun 29, 2020Updated 5 years ago
- source code of EfficientTTS 2☆20Feb 18, 2024Updated 2 years ago
- Usings LLM chat with knowledges☆21Aug 12, 2023Updated 2 years ago
- Various Text-to-speech (TTS) papers based on Deep-learning☆14Feb 26, 2021Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- graphrag的基础架构☆46Oct 17, 2024Updated last year
- Text2Neo4j 是一个遍历文档、从文本中提取关系并将其保存到 Neo4j 数据库中以形成知识图谱的工具。本项目结合了 Dify 和 LLaMA3.1(8B 模型)来高效处理和提取复杂关系。☆24Aug 31, 2024Updated last year
- Pre-trained grapheme-to-phoneme (G2P) models☆26Jul 27, 2021Updated 4 years ago
- 基于大语言模型API(本地或商用API)的外挂知识库问答系统(附neo4j实现知识图谱)☆50Jun 10, 2025Updated 11 months ago
- A lightweight audio codec based on a single quantizer☆70Aug 15, 2025Updated 9 months ago
- An implementation of the exponential random graph model☆28May 14, 2014Updated 12 years ago
- ☆31Oct 29, 2024Updated last year
- 轻量级知乎爬虫,支持问题、收藏夹和本月最热☆24Dec 19, 2018Updated 7 years ago
- CSDN of ManVictor☆23Mar 31, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 训练词向量☆22Sep 26, 2020Updated 5 years ago
- a unity-package allows to make annotations on arbitrary Unity-scenes of architectural sites☆15Dec 11, 2017Updated 8 years ago
- [ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training☆72Apr 6, 2026Updated last month
- 智鹿:中文消金领域对话大模型☆30Nov 12, 2023Updated 2 years ago
- ☆11May 21, 2026Updated last week
- Signed Distance Field Map Generator☆10Jun 19, 2023Updated 2 years ago
- ☆36May 22, 2026Updated last week