adetion / txtfilemergeLinks
TXT文本语料数据清洗(Text corpus data cleaning):1> 合并TXT文件;2> 过滤干扰字符串;3> 对人名、地名、组织机构进行遮码处理;4> 将其他编码格式统一转换为UTF-8
☆18Updated 2 years ago
Alternatives and similar repositories for txtfilemerge
Users that are interested in txtfilemerge are comparing it to the libraries listed below
Sorting:
- 中文文本相似度计算器☆160Updated last year
- MiniRBT (中文小型预训练模型系列)☆294Updated 2 months ago
- <数字人文教程>资源合集☆102Updated last year
- 中文 NLP 资源库,语料库,相关的框架,文章收集。☆26Updated 3 years ago
- A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。☆33Updated 2 years ago
- cntext 是一个专为社会科学实证研究设计的中文文本分析 Python 库。它不仅提供传统的词频统计和情感分析,还支持词嵌入训练、语义投影计算等高级功能,帮助研究者从大规模非结构化文本中测量抽象构念 ——如态度、认知、文化观念与心理状态。☆369Updated 2 weeks ago
- 一个简单快速的分词、命名实体识别工具☆610Updated last week
- CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)☆253Updated 2 months ago
- 仇恨言论语料库☆24Updated 2 years ago
- Mimix: A Text Generation Tool and Pretrained Chinese Models☆158Updated 11 months ago
- A NLP package for Chinese text:Preprocessing, Tokenization, Chinese Fonts, Word Embeddings, Text Similarity and Sentiment Analysis 轻量级中文自…☆30Updated 11 months ago
- [COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集☆641Updated 2 years ago
- 雅意信息抽取大模型:在百万级人工构造的高质量信息抽取数据上进行指令微调,由中科闻歌算法团队研发。 (Repo for YAYI Unified Information Extraction Model)☆310Updated last year
- 一个面向中文文本纠错任务的综合平台,集学术研究、模型训练、模型评测和推理部署于一体,覆盖拼写纠错与语法纠错两个核心方向。☆385Updated last month
- Minimal keyword extraction with BERT☆88Updated 3 years ago
- 一个基于预训练的句向量生成工具☆138Updated 2 years ago
- 基于pytorch的中文意图识别和槽位填充☆190Updated last month
- 人民日报爬虫(Python)☆143Updated 2 months ago
- Alpaca Chinese Dataset -- 中文指令微调数据集☆214Updated 11 months ago
- ☆162Updated last year
- ☆59Updated 4 years ago
- PaddleNLP UIE模型的PyTorch版实现☆652Updated 2 years ago
- ChatGPT WebUI using gradio. 给 LLM 对话和检索知识问答RAG提供一个简单好用的Web UI界面☆135Updated last year
- 时间抽取、解析、标准化工具☆55Updated 2 years ago
- ChatGLM-6B fine-tuning.☆136Updated 2 years ago
- baichuan and baichuan2 finetuning and alpaca finetuning☆33Updated 6 months ago
- A convenient Chinese word segmentation tool 简便中文分词器☆48Updated 4 months ago
- 打造人人都会的NLP,开源不易,记得star哦☆101Updated 2 years ago
- pke_zh, python keyphrase extraction for chinese(zh). 中文关键词或关键句提取工具,实现了KeyBert、PositionRank、TopicRank、TextRank等算法,开箱即用。☆207Updated last year
- 评估自然语言的流畅度☆118Updated 4 years ago