文本分类是指在给定分类体系下 , 根据文本的内容自动确定文本类别的过程。首先我们根据scrapy爬虫根据中国知网URL的规律,爬取70多万条2014年公开的发明专利,然后通过数据清洗筛选出了60多万条含标签数据。通过TF-IDF对60多万条本文进行词频提取,依照词频排序提取前3000个词语形成语义词典,然后根据观察设置停用词。然后再用TF-IDF的方式对每个摘要进行词频选取,通过布尔模型,对比语义词典生成文本向量。然后对标签进行数字化转换。取90%的文本为训练集,10%的文本为测试集。用有监督学习的SVM算法对文本进行分类,(人类生活必需品、作业运输、化学冶金、纺织造纸、固定建筑物、机械工程、物理学、电学)分成8类
☆108Mar 14, 2018Updated 8 years ago
Alternatives and similar repositories for CNKI_Patent_SVM
Users that are interested in CNKI_Patent_SVM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 使用LDA+SVM进行文本的分类☆22Jul 23, 2017Updated 8 years ago
- Implementation of "Optimizing neural networks for patent classification" paper☆14Jun 24, 2019Updated 6 years ago
- The USPTO Patent Exploring Tool (UPET) provides Python code for downloading, parsing, and loading USPTO patent bulk data into a local MyS…☆34May 5, 2013Updated 13 years ago
- 利用支持向量机实现中文文本分类☆29May 28, 2018Updated 7 years ago
- SVM中文文本分类☆13Mar 13, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 📃您身边的AI法律顾问(比赛项目)☆24Feb 26, 2024Updated 2 years ago
- 基于SVM的中文文本分类; python☆13May 24, 2019Updated 6 years ago
- 基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。☆110Dec 24, 2018Updated 7 years ago
- 一个自然语言处理的可视化系统,实现自动生成词云图、文章关键信息提取、多文档主题分布、文本分类等功能,还有一些业务数据的可视化图表展示。☆38Jan 27, 2021Updated 5 years ago
- 多标签文本分类☆53Jun 8, 2019Updated 6 years ago
- a spider for cnki patent content, just for study and commucation, no use for business.☆123Dec 21, 2017Updated 8 years ago
- 上市公司年报分析☆12Jul 16, 2019Updated 6 years ago
- 毕业论文代码 + 评论文本数据获取+数据清洗+文本数据向量化+将数据放进分类器(KNN+Naive Bayes+SVM)中训练+结果评估☆55May 17, 2022Updated 4 years ago
- “达观杯”长文本智能处理挑战赛。达观数据提供了一批长文本数据和分类信息,希望选手动用自己的智慧,结合当下最先进的NLP和人工智能技术,深入分析文本内在结构和语义信息,构建文本分类模型,实现精准分类。☆10Jul 20, 2018Updated 7 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Qimen表示的是奇门遁甲之术,用于抽取各种实体的工具。☆29Jan 12, 2020Updated 6 years ago
- pyspark+Word2Vec+Tfidf+LSH、文章相似性推荐☆26Mar 5, 2020Updated 6 years ago
- 2021软件杯-新闻智分系统项目开源,基于PaddleHub通过预训练模型ERNIE-Tiny在整合与爬取的新闻10分类数据集上进行微调完成模型训练,可实现精细的新闻长文本10分类任务。最后基于PyQt5完成GUI可视化界面开发以及基于VUE+FastAPI完成该项目的we…☆25Jan 25, 2022Updated 4 years ago
- 复审委无效决定、复审决定Python爬取☆16Mar 5, 2019Updated 7 years ago
- The enhanced RCNN model used for sentence similarity classification☆44May 30, 2021Updated 4 years ago
- Implementation of Deep Dirichlet Multinomial Regression in python + cython.☆16Mar 7, 2018Updated 8 years ago
- 中国法律快查手册☆12Aug 19, 2025Updated 9 months ago
- 中国常用法律查询手册 | Law-Book☆52Aug 31, 2022Updated 3 years ago
- 使用keras框架Embedding+LSTM对短文本分类-半监督☆16Nov 13, 2017Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 文本分类之特征选择☆11Aug 12, 2017Updated 8 years ago
- 人脸年龄识别☆24Apr 26, 2019Updated 7 years ago
- This project is simply for the purpose of education. The primary objective of this project is to build a deep-learning model to identify …☆10Apr 1, 2021Updated 5 years ago
- 对收集的法律文档进行一系列分析,包括根据规范自动切分、案件相似度计算、案件聚类、法律条文推荐等(试验目前基于婚姻类案件,可扩展至其它领域)。☆203Mar 21, 2017Updated 9 years ago
- Application for processing Chinese text : Sentiment , Keywords , Abstract☆10Apr 13, 2017Updated 9 years ago
- 使用scik-learn 实现k-means,KNN,SVM,贝叶斯,topic_extraction等算法,同时评估分类的准确率,召回率和F值。语料库是中文文本☆43Jul 23, 2017Updated 8 years ago
- (已失效)自动生成知网期刊文献Bibtex并导入Zotero;自定义无csl文件的Zotero文献导出样式,在任何引用格式需求下实现随写随引;(已被Zotero6.0Beta实现)将所需知网文献批量、自动化导入Zotero。☆12Sep 26, 2024Updated last year
- RankNet, LambdaRank, LambdaMART, GBrank☆14Nov 16, 2013Updated 12 years ago
- PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT☆117Nov 3, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- 关系抽取个人实战总结以及开源工具包使用☆55Dec 5, 2018Updated 7 years ago
- 使用tf-idf, TextRank4ZH等不同方式从中文文本中提取关键字,从中文文本中提取摘要和关键词☆34Dec 12, 2018Updated 7 years ago
- 互联网新闻情感分析赛题baseline☆42Sep 18, 2019Updated 6 years ago
- 复现了论文《基于主题模型的短文本关键词抽取及扩展》的代码☆31Nov 11, 2020Updated 5 years ago
- 使用2018年度部分大众点评的用户评价作为数据集,未筛选前共440万条评论数据,经过数据集的标签化处理以及中文文本的预处理、特征提取以及特征权重后,使用了SVM,朴素贝叶斯,Adabosst等经典机器学习方法进行分类,之后又使用了Bi-LSTM的深度神经网络进行训练分类。☆13Nov 11, 2021Updated 4 years ago
- Chinese version code for the paper "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks"☆11Jul 25, 2019Updated 6 years ago
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 4 years ago