bojone/word-discovery

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bojone/word-discovery)

bojone / word-discovery

速度更快、效果更好的中文新词发现

☆512

Alternatives and similar repositories for word-discovery

Users that are interested in word-discovery are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhanzecheng / Chinese_segment_augment
View on GitHub
python3实现互信息和左右熵的新词发现
☆593Aug 1, 2019Updated 6 years ago
Rayarrow / New-Word-Discovery
View on GitHub
新词发现基于词频、凝聚系数和左右邻接信息熵
☆122Mar 14, 2020Updated 6 years ago
HI-AGI / kaitian-xinci
View on GitHub
开天-新词，中文新词发现工具，Chinese New Word Discovery Tool
☆22Dec 5, 2019Updated 6 years ago
sing1ee / dict_build
View on GitHub
自动构建中文词库：http://www.matrix67.com/blog/archives/5044
☆656Dec 5, 2023Updated 2 years ago
brightmart / albert_zh
View on GitHub
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
☆3,981Nov 21, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
smoothnlp / SmoothNLP
View on GitHub
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference
☆623Feb 3, 2021Updated 5 years ago
ZhuiyiTechnology / simbert
View on GitHub
a bert for retrieval and generation
☆860Feb 26, 2021Updated 5 years ago
ZhuiyiTechnology / pretrained-models
View on GitHub
Open Language Pre-trained Model Zoo
☆1,003Nov 18, 2021Updated 4 years ago
bojone / kg-2019
View on GitHub
2019年百度的三元组抽取比赛，“科学空间队”源码
☆766May 16, 2020Updated 6 years ago
Mechanic934 / New-Word-Detection
View on GitHub
新词发现算法(NewWordDetection)
☆63Sep 4, 2017Updated 8 years ago
shibing624 / pycorrector
View on GitHub
pycorrector is a toolkit for text error correction. 文本纠错，实现了Kenlm，T5，MacBERT，ChatGLM3，Qwen2.5等模型应用在纠错场景，开箱即用。
☆6,495Updated this week
yanghanxy / New-Word-Detection
View on GitHub
新词发现算法(NewWordDetection)
☆91Mar 22, 2021Updated 5 years ago
iqiyi / FASPell
View on GitHub
2019-SOTA简繁中文拼写检查工具：FASPell Chinese Spell Checker (Chinese Spell Check / 中文拼写检错 / 中文拼写纠错 / 中文拼写检查)
☆1,224Sep 3, 2022Updated 3 years ago
Moonshile / ChineseWordSegmentation
View on GitHub
Chinese word segmentation algorithm without corpus（无需语料库的中文分词）
☆499Sep 3, 2020Updated 5 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
dbiir / UER-py
View on GitHub
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
☆3,111May 9, 2024Updated 2 years ago
bojone / nlp-zero
View on GitHub
基于最小熵原理的NLP工具包
☆139Jan 14, 2022Updated 4 years ago
brightmart / roberta_zh
View on GitHub
RoBERTa中文预训练模型: RoBERTa for Chinese
☆2,793Jul 22, 2024Updated 2 years ago
brightmart / nlp_chinese_corpus
View on GitHub
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
☆9,907Feb 6, 2026Updated 5 months ago
bojone / bert4keras
View on GitHub
keras implement of transformers for humans
☆5,417Nov 11, 2024Updated last year
huawei-noah / Pretrained-Language-Model
View on GitHub
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
☆3,162Jan 22, 2024Updated 2 years ago
ymcui / Chinese-BERT-wwm
View on GitHub
Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）
☆10,224Apr 19, 2026Updated 3 months ago
yongzhuo / Macropodus
View on GitHub
自然语言处理工具Macropodus，基于Albert+BiLSTM+CRF深度学习网络架构，中文分词，词性标注，命名实体识别，新词发现，关键词，文本摘要，文本相似度，科学计算器，中文数字阿拉伯数字(罗马数字)转换，中文繁简转换，拼音转换。tookit(tool) of N…
☆660Mar 24, 2023Updated 3 years ago
izisong / new-words-discovery
View on GitHub
新词发现
☆66May 30, 2014Updated 12 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
CLUEbenchmark / CLUEPretrainedModels
View on GitHub
高质量中文预训练模型集合：最先进大模型、最快小模型、相似度专门模型
☆810Jul 8, 2020Updated 6 years ago
sunyilgdx / SIFRank_zh
View on GitHub
Keyphrase or Keyword Extraction 基于预训练模型的中文关键词抽取方法（论文SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained La…
☆431May 17, 2020Updated 6 years ago
Chuanyunux / Chinese-NewWordRecognition
View on GitHub
专业领域词库构建/中文新词发现/专业词库发现
☆31Jan 10, 2020Updated 6 years ago
ownthink / Jiagu
View on GitHub
Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类
☆3,426May 7, 2022Updated 4 years ago
macanv / BERT-BiLSTM-CRF-NER
View on GitHub
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
☆4,905Feb 24, 2021Updated 5 years ago
mattzheng / py-kenlm-model
View on GitHub
python | 高效使用统计语言模型kenlm：新词发现、分词、智能纠错等
☆172Sep 27, 2019Updated 6 years ago
ChineseGLUE / ChineseGLUE
View on GitHub
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
☆1,782Feb 18, 2023Updated 3 years ago
CLUEbenchmark / CLUE
View on GitHub
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
☆4,274Feb 6, 2026Updated 5 months ago
bojone / infomap
View on GitHub
a beautiful method for cluster or community detection
☆52Oct 19, 2019Updated 6 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
blmoistawinde / HarvestText
View on GitHub
文本挖掘和预处理工具（文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等），无监督或弱监督方法
☆2,625May 13, 2024Updated 2 years ago
luozhouyang / AutoPhraseX
View on GitHub
Automated Phrase Mining from Massive Text Corpora in Python.
☆176May 23, 2021Updated 5 years ago
ymcui / Chinese-XLNet
View on GitHub
Pre-Trained Chinese XLNet（中文XLNet预训练模型）
☆1,647Apr 19, 2026Updated 3 months ago
ymcui / Chinese-ELECTRA
View on GitHub
Pre-trained Chinese ELECTRA（中文ELECTRA预训练模型）
☆1,433Apr 19, 2026Updated 3 months ago
panchunguang / ccks_baidu_entity_link
View on GitHub
ccks baidu entity link 实体链接第一名
☆841Dec 19, 2023Updated 2 years ago
thunlp / OpenCLaP
View on GitHub
Open Chinese Language Pre-trained Model Zoo
☆983Mar 18, 2020Updated 6 years ago
425776024 / nlpcda
View on GitHub
一键中文数据增强包； NLP数据增强、bert数据增强、EDA：pip install nlpcda
☆1,879Mar 18, 2025Updated last year