adetion/txtfilemerge

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/adetion/txtfilemerge)

adetion / txtfilemerge

TXT文本语料数据清洗（Text corpus data cleaning）：1> 合并TXT文件；2> 过滤干扰字符串；3> 对人名、地名、组织机构进行遮码处理；4> 将其他编码格式统一转换为UTF-8

☆19

Alternatives and similar repositories for txtfilemerge

Users that are interested in txtfilemerge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Many0therFunctions / MaskGCT-Text-To-Semantic-Finetune
View on GitHub
This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …
☆13Dec 4, 2024Updated last year
SystemPanic / flashinfer-windows
View on GitHub
FlashInfer: Kernel Library for LLM Serving (Windows build & kernels)
☆16Jul 1, 2026Updated 3 weeks ago
npujcong / Chinese_PSP
View on GitHub
Chinese Prosodic Structure Prediction
☆10May 18, 2019Updated 7 years ago
carterwayneskhizeine / hermes-agent-windows-R
View on GitHub
Windows-native adaptation fork of Hermes Agent, based on upstream Hermes Agent 0.13.0. Improves local runtime environment, path handling,…
☆19May 15, 2026Updated 2 months ago
jaywalnut310 / Vector-Quantized-Autoencoders
View on GitHub
Tensorflow Implementation of "Theory and Experiments on Vector Quantized Autoencoders"
☆15Feb 27, 2019Updated 7 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ohlionel / Prune-Tune
View on GitHub
Official code repository for AAAI2021 paper Finding Sparse Structures for Domain Specific Neural Machine Translation
☆11Apr 1, 2021Updated 5 years ago
PristineStream / ChatGPT-Chinese-Tutorial
View on GitHub
ChatGPT中文学习和实践资料汇总——LLaMA、ChatGLM等大模型的Finetune
☆14Updated this week
StevenLau6 / FINDSum
View on GitHub
A Large-Scale Dataset for Long Text and Multi-Table Summarization
☆18Feb 21, 2024Updated 2 years ago
Rorical / clip-as-service-rs
View on GitHub
A blazing fast CLIP gRPC service in rust.
☆16Aug 9, 2023Updated 2 years ago
hongwen-sun / speech-aligner
View on GitHub
speech-aligner，是一个从“人声语音”及其“语言文本”，产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech an…
☆15Dec 19, 2018Updated 7 years ago
mcf330 / efts2code
View on GitHub
source code of EfficientTTS 2
☆21Feb 18, 2024Updated 2 years ago
hamidkarimi / dope
View on GitHub
This reposity holds the code for paper Online Academic Course Performance Prediction using Relational Graph Convolutional Neural Network
☆11Jul 25, 2024Updated 2 years ago
gxr404 / github-activity
View on GitHub
Beautify github user activity display
☆18Dec 9, 2024Updated last year
CyberZHG / torch-layer-normalization
View on GitHub
Layer normalization in PyTorch
☆20Jun 6, 2020Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
waylandzhang / train_tokenizer
View on GitHub
A demonstration of how to train a custom tokenizer similar to TikToken.
☆15Jan 6, 2025Updated last year
petronny / g2p
View on GitHub
Pre-trained grapheme-to-phoneme (G2P) models
☆26Jul 27, 2021Updated 5 years ago
wenerme / dockerfiles
View on GitHub
Dockerfiles
☆17Jul 5, 2026Updated 3 weeks ago
Thireus / llama.cpp
View on GitHub
Thireus's fork of llama.cpp with Cuda 12.8 and 13.3 release builds and Windows patch for loading more .gguf shards + llama-sweep-bench
☆30Updated this week
imfing / issues-blog
View on GitHub
📑 Publish GitHub Issues as blog or newsletter via GitHub actions automatically
☆13Jan 11, 2025Updated last year
kingname / TeamFlowy
View on GitHub
A simple sync tool to sync task from Workflowy to Teambition
☆32Oct 4, 2017Updated 8 years ago
WuNein / vllm4mteb
View on GitHub
vLLM for embedding tasks using Original LLMs (Qwen2, LLaMA)
☆29Sep 9, 2024Updated last year
sazima / buddy-ai
View on GitHub
Buddy AI - AI Browser Agent: Automate Multi-Step Tasks with Natural Language, llm自动执行浏览器任务的Chrome扩展
☆15Jul 4, 2026Updated 3 weeks ago
mbzuai-nlp / sttatts
View on GitHub
☆31Oct 29, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
zjzser / TraceableSpeech
View on GitHub
TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking
☆21Apr 18, 2025Updated last year
SYSU-MUCFC-FinTech-Research-Center / ZhiLu
View on GitHub
智鹿：中文消金领域对话大模型
☆30Nov 12, 2023Updated 2 years ago
framefield / vr-annotate
View on GitHub
a unity-package allows to make annotations on arbitrary Unity-scenes of architectural sites
☆15Dec 11, 2017Updated 8 years ago
youseegreen / Unity_DesktopMascot_Framework
View on GitHub
Unityでデスクトップマスコットを作る際のフレームワーク、軽量にしたつもり
☆12Sep 9, 2020Updated 5 years ago
TakeshiCho / SignedDistanceField_Map_Generator
View on GitHub
Signed Distance Field Map Generator
☆10Jun 19, 2023Updated 3 years ago
zhai-lw / SQCodec
View on GitHub
A lightweight audio codec based on a single quantizer
☆72Aug 15, 2025Updated 11 months ago
Jamessfks / mace
View on GitHub
SimpleAtom, a web interface for MACE(Message Passing Atomic Cluster Expansion) a machine-learning-based interatomic potential
☆17Jul 11, 2026Updated 2 weeks ago
InsightEdge01 / ScrapegraphAIOllamallama3
View on GitHub
☆24May 14, 2024Updated 2 years ago
ashih42 / Rubik
View on GitHub
Play and solve a 3x3x3 Rubik's cube with Thistlethwaite's algorithm in Unity C#. (42 Silicon Valley)
☆12Apr 20, 2019Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chromee / FullEmojiSupportApi
View on GitHub
☆10May 21, 2021Updated 5 years ago
ETH-DISCO / audio-atlas
View on GitHub
☆15Feb 6, 2026Updated 5 months ago
lzh1590 / MMDCameraPath
View on GitHub
MMD Camera Path Vmd's File For Unity3D
☆12Sep 19, 2017Updated 8 years ago
jfranmora / unity-postprocessing-cut-screen-shader
View on GitHub
☆14Jul 30, 2019Updated 6 years ago
ARYKEI / unity-ParallaxHairShader
View on GitHub
4-layer(RGBA) parallax hair shader
☆16May 8, 2018Updated 8 years ago
wuchuheng / web-sqlite-js
View on GitHub
web-sqlite-js is a friendly, out-of-the-box SQLite database for the web that makes persistent client-side storage simple for every develo…
☆18Jan 28, 2026Updated 6 months ago
hackintosh-club / MSI-B760M-BOMBER-WIFI-OpenCore
View on GitHub
MSI B760M BOMBER OpenCore MacOS 12 - 15
☆11Apr 10, 2025Updated last year