2hip3ng / chinese-text-clean
中文文本数据清理,去url,去非中文、英文、数字字符,分词,去停用词,去空行(根据文本需求再加自定义清理)
☆17Updated 5 years ago
Alternatives and similar repositories for chinese-text-clean:
Users that are interested in chinese-text-clean are comparing it to the libraries listed below
- 基于simcse的中文句向量生成☆15Updated 2 years ago
- This is a small NLP project "E-commerce Title Data Similarity Matching System". The usage methods are: tfidf+word bag model, cosine simil…☆25Updated 4 years ago
- Bert预训练模型fine-tune计算文本相似度☆104Updated last year
- 中文无监督SimCSE Pytorch实现☆134Updated 3 years ago
- 基于pytorch的百度UIE命名实体识别。☆57Updated 2 years ago
- A light NER Tool,NER标注工具,基于Vue & FastAPI,带NER数据增强☆64Updated 4 years ago
- 专业领域词库构建/中文新词发现/专业词库发现☆29Updated 5 years ago
- chinese version of longformer☆113Updated 4 years ago
- Efficient-GlobalPointer的关系抽取任务☆23Updated 3 years ago
- GlobalPointer的优化版/NER实体识别☆120Updated 3 years ago
- 中文bigbird预训练模型☆91Updated 2 years ago
- NLP实验:新词挖掘+预训练模型继续Pre-training☆47Updated last year
- 无监督中文关键词抽取(Keyphrase Extraction),基于统计,基于图【LDA与PageRank(TextRank, TPR, Salience Rank, Single TPR等)】,基于嵌入【SIFRank等】,开箱即用!☆104Updated 2 years ago
- 基于Pytorch的知识蒸馏(中文文本分类)☆18Updated 2 years ago
- 基于GlobalPointer的实体/关系/事件抽取☆146Updated 3 years ago
- 电商领域命名实体识别☆9Updated 2 years ago
- 继续预训练中文bert☆30Updated 3 years ago
- Tensorflow2.3的文本分类项目,支持各种分类模型,支持相关tricks。☆175Updated 5 months ago
- NLP句子编码、句子embedding、语义相似度:BERT_avg、BERT_whitening、SBERT、SmiCSE☆175Updated 3 years ago
- NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现☆25Updated last year
- 使用sentence-transformers(SBert)训练自己的文本相似度数据集并进行评估。☆47Updated 3 years ago
- 基于pytorch+bilstm_crf的中文命名实体识别☆14Updated 2 years ago
- Sentence-Transformers Information Retrieval example on Chinese☆29Updated last year
- 本项目使用云问科技训练的中文版UniLM模型对微博数据集进行自动标题生成。☆38Updated last year
- 🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。☆113Updated last year
- 基于PaddleNLP开源的抽取式UIE进行医学命名实体识别(torch实现)☆44Updated 2 years ago
- 利用指针网络进行信息抽取,包含命名实体识别、关系抽取、事件抽取。☆123Updated 2 years ago
- Bert分类,语义相似度,获取句向量。☆64Updated last month
- bert pytorch模型微调用于的多标签文本分类☆132Updated 5 years ago
- 本项目采用Keras和ALBERT实现文本多分类任务,其中对ALBERT进行微调。☆17Updated 4 years ago