TXT文本语料数据清洗(Text corpus data cleaning):1> 合并TXT文件;2> 过滤干扰字符串;3> 对人名、地名、组织机构进行遮码处理;4> 将其他编码格式统一转换为UTF-8
☆19Oct 14, 2022Updated 3 years ago
Alternatives and similar repositories for txtfilemerge
Users that are interested in txtfilemerge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。☆35Oct 18, 2022Updated 3 years ago
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆13Dec 4, 2024Updated last year
- bumble bee transformer☆14Apr 19, 2021Updated 4 years ago
- Chinese Prosodic Structure Prediction☆10May 18, 2019Updated 6 years ago
- Tensorflow Implementation of "Theory and Experiments on Vector Quantized Autoencoders"☆15Feb 27, 2019Updated 7 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Official code repository for AAAI2021 paper Finding Sparse Structures for Domain Specific Neural Machine Translation☆11Apr 1, 2021Updated 4 years ago
- Node and Browser env supported WebAssembly version of fastText: Library for efficient text classification and representation learning.☆13Sep 17, 2024Updated last year
- ChatGPT中文学习和实践资料汇总——LLaMA、ChatGLM等大模型的Finetune☆14Apr 17, 2023Updated 2 years ago
- A demonstration of how to train a custom tokenizer similar to TikToken.☆15Jan 6, 2025Updated last year
- Dockerfiles☆17Mar 23, 2026Updated last week
- A Large-Scale Dataset for Long Text and Multi-Table Summarization☆18Feb 21, 2024Updated 2 years ago
- Beautify github user activity display☆18Dec 9, 2024Updated last year
- Batch processor to enable large content be digested by Ollama, focused around book processing and translations by default, fully, configu…☆36Oct 27, 2025Updated 5 months ago
- TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking☆21Apr 18, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- speech-aligner,是一个从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech an…☆15Dec 19, 2018Updated 7 years ago
- source code of EfficientTTS 2☆20Feb 18, 2024Updated 2 years ago
- Inference of MiniCPM-o 2.6 in plain C/C++☆33Oct 14, 2025Updated 5 months ago
- 将word2vec训练生成的词向量和BERT生成的词向量进行可视化对比☆15Jun 29, 2020Updated 5 years ago
- Layer normalization in PyTorch☆20Jun 6, 2020Updated 5 years ago
- 在index-tts-vllm的基础上,实现了并提供了模拟流式合成音频的接口服务及客户端测试脚本☆26Sep 2, 2025Updated 6 months ago
- 📑 Publish GitHub Issues as blog or newsletter via GitHub actions automatically☆13Jan 11, 2025Updated last year
- graphrag的基础架构☆46Oct 17, 2024Updated last year
- Text2Neo4j 是一个遍历文档、从文本中提取关系并将其保存到 Neo4j 数据库中以形成知识图谱的工具。本项目结合了 Dify 和 LLaMA3.1(8B 模型)来高效处理和提取复杂关系。☆23Aug 31, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Pre-trained grapheme-to-phoneme (G2P) models☆26Jul 27, 2021Updated 4 years ago
- A lightweight audio codec based on a single quantizer☆69Aug 15, 2025Updated 7 months ago
- ☆23May 14, 2024Updated last year
- ☆22Sep 1, 2025Updated 6 months ago
- Hackintosh EFI for MSI Pro B760M-A WIFI DDR4 Ⅱ(2 Gen) + i5-12600kf + Gigabyte Radeon RX 6600 XT Gaming OC 8G | 黑苹果 macOS & Windows 双系统配置 …☆17Oct 16, 2024Updated last year
- ☆31Oct 29, 2024Updated last year
- CSDN of ManVictor☆22Mar 31, 2025Updated 11 months ago
- a unity-package allows to make annotations on arbitrary Unity-scenes of architectural sites☆15Dec 11, 2017Updated 8 years ago
- ☆10Oct 11, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆35Updated this week
- 喜鹊儿APP(青果系统)_接口封装☆14Jul 7, 2020Updated 5 years ago
- MSI B760M BOMBER OpenCore MacOS 12 - 15☆11Apr 10, 2025Updated 11 months ago
- Play and solve a 3x3x3 Rubik's cube with Thistlethwaite's algorithm in Unity C#. (42 Silicon Valley)☆12Apr 20, 2019Updated 6 years ago
- YTGPT is a Google Chrome extension that leverages the power of "ChatGPT Anywhere" to provide concise summaries of YouTube videos. Designe…☆21Apr 8, 2024Updated last year
- ☆10May 21, 2021Updated 4 years ago
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 6 months ago