hiyoung123 / DuplicateRemoveLinks
基于simhash的文本去重算法
☆20Updated 3 years ago
Alternatives and similar repositories for DuplicateRemove
Users that are interested in DuplicateRemove are comparing it to the libraries listed below
Sorting:
- 基于seq2edit (Gector) 的中文文本纠错。☆28Updated 2 years ago
- bert_avg,bert_whitening,sbert,consert,simcse,esimcse 中文句向量表示☆16Updated 3 years ago
- 基于Pytorch实现的中文文本分类脚手架,以及常用模型对比。☆18Updated 4 years ago
- 时间关键词正则提取以及标准化☆21Updated 3 years ago
- 文本智能校对大赛(Chinese Text Correction)的baseline☆67Updated 2 years ago
- 实验苏神的CoSENT的Torch实现☆32Updated 3 years ago
- sodic2021 法律咨询智能问答 Baseline 线上35+☆17Updated 4 years ago
- ☆17Updated 4 years ago
- 关键词抽取项目☆24Updated 4 years ago
- benchmark of KgCLUE, with different models and methods☆27Updated 3 years ago
- 句子匹配模型,包括无监督的SimCSE、ESimCSE、PromptBERT,和有监督的SBERT、CoSENT。☆99Updated 2 years ago
- NLP实验:新词挖掘+预训练模型继续Pre-training☆47Updated last year
- 中文bigbird预训练模型☆92Updated 2 years ago
- using lear to do ner extraction☆29Updated 3 years ago
- ☆57Updated 2 years ago
- 基于向量召回的检索式对话系统解决方案,dense retrieval,FAQ……☆33Updated 3 years ago
- 基于预训练模型的中文关键词抽取方法(论文SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model 的中文版代码)☆12Updated 5 years ago
- Sentence-Transformers Information Retrieval example on Chinese☆29Updated last year
- 中文文本纠错模型,keras实现☆74Updated 3 years ago
- 开课吧&后厂理工学院_百度NLP项目2:试题数据集多标签文本分类 Models: FastText TextCNN GCN BERT et al.☆48Updated 5 years ago
- 基于PaddleNLP开源的抽取式UIE进行医学命名实体识别(torch实现)☆43Updated 2 years ago
- Seq2seqAttGeneration, an basic implementation of text generation that using seq2seq attention model to generate poem series. this project…☆18Updated 4 years ago
- pytorch版基于gpt+nezha的中文多轮Cdial☆11Updated 2 years ago
- pytorch版simcse无监督语义相似模型☆22Updated 4 years ago
- 长文本相似度模型☆21Updated last year
- ☆87Updated 3 years ago
- 法研杯犯罪金额提取☆12Updated 3 years ago
- CTC2021-中文文本纠错大赛的SOTA方案及在线演示☆72Updated last year
- DescriptionPairsExtraction, entity and it's description pairs extract program based on Albert and data back-annotation. 基于Albert与结构化数据回标思…☆20Updated 3 years ago
- 不用tensorflow estimator,分别采用字mask和wwm mask在中文领域内finetune bert模型☆23Updated 5 years ago