文本分类是指在给定分类体系下 , 根据文本的内容自动确定文本类别的过程。首先我们根据scrapy爬虫根据中国知网URL的规律,爬取70多万条2014年公开的发明专利,然后通过数据清洗筛选出了60多万条含标签数据。通过TF-IDF对60多万条本文进行词频提取,依照词频排序提取前3000个词语形成语义词典,然后根据观察设置停用词。然后再用TF-IDF的方式对每个摘要进行词频选取,通过布尔模型,对比语义词典生成文本向量。然后对标签进行数字化转换。取90%的文本为训练集,10%的文本为测试集。用有监督学习的SVM算法对文本进行分类,(人类生活必需品、作业运输、化学冶金、纺织造纸、固定建筑物、机械工程、物理学、电学)分成8类
☆108Mar 14, 2018Updated 8 years ago
Alternatives and similar repositories for CNKI_Patent_SVM
Users that are interested in CNKI_Patent_SVM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 使用LDA+SVM进行文本的分类☆22Jul 23, 2017Updated 8 years ago
- 爬取专利信息的爬虫☆27Sep 27, 2016Updated 9 years ago
- Implementation of "Optimizing neural networks for patent classification" paper☆14Jun 24, 2019Updated 7 years ago
- 利用支持向量机实现中文文本分类☆29May 28, 2018Updated 8 years ago
- SVM中文文本分类☆13Mar 13, 2022Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 📃您身边的AI法律顾问(比赛项目)☆24Feb 26, 2024Updated 2 years ago
- 基于SVM的中文文本分类; python☆13May 24, 2019Updated 7 years ago
- 基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。☆110Dec 24, 2018Updated 7 years ago
- 一个自然语言处理的可视化系统,实现自动生成词云图、文章关键信息提取、多文档主题分布、文本分类等功能,还有一些业务数据的可视化图表展示。☆38Jan 27, 2021Updated 5 years ago
- 多标签文本分类☆53Jun 8, 2019Updated 7 years ago
- a spider for cnki patent content, just for study and commucation, no use for business.☆123Dec 21, 2017Updated 8 years ago
- 毕业论文代码 + 评论文本数据获取+数据清洗+文本数据向量化+将数据放进分类器(KNN+Naive Bayes+SVM)中训练+结果评估☆55May 17, 2022Updated 4 years ago
- “达观杯”长文本智能处理挑战赛。达观数据提供了一批长文本数据和分类信息,希望选手动用自己的智慧,结合当下最先进的NLP和人工智能技术,深入分析文本内在结构和语义信息,构建文本分类模型,实现精准分类。☆10Jul 20, 2018Updated 7 years ago
- The crawler for data on web of science, especially focus on the analysis of citation data☆16Dec 14, 2018Updated 7 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- pytorch版损失函数,改写自科学空间文章,【通过互信息思想来缓解类别不平衡问题】、【将“softmax+交叉熵”推广到多标签分类问题】☆12Aug 22, 2021Updated 4 years ago
- pyspark+Word2Vec+Tfidf+LSH、文章相似性推荐☆26Mar 5, 2020Updated 6 years ago
- 2021软件杯-新闻智分系统项目开源,基于PaddleHub通过预训练模型ERNIE-Tiny在整合与爬取的新闻10分类数据集上进行微调完成模型训练,可实现精细的新闻长文本10分类任务。最后基于PyQt5完成GUI可视化界面开发以及基于VUE+FastAPI完成该项目的we…☆25Jan 25, 2022Updated 4 years ago
- 中国法律快查手册☆12Aug 19, 2025Updated 10 months ago
- Package to parse and analyze trademark data from the United States Patent and Trademark Office☆14Apr 5, 2017Updated 9 years ago
- AI skill for humanities thesis writing — from topic selection to publication. 8 academic databases, 21 review rules, anti-hallucination g…☆221Apr 29, 2026Updated 2 months ago
- 人脸年龄识别☆24Apr 26, 2019Updated 7 years ago
- Chinese word segmentation algorithm based on entropy(基于熵,无需语料库的中文分词)☆11Feb 27, 2018Updated 8 years ago
- NLP的一些小例子,如:文本分类、文本纠错、关键词提取、自动摘要等☆23Dec 12, 2018Updated 7 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Application for processing Chinese text : Sentiment , Keywords , Abstract☆10Apr 13, 2017Updated 9 years ago
- Blockchain and Smart Contract basic code for "mining" myself !!!☆13Jun 6, 2022Updated 4 years ago
- (已失效)自动生成知网期刊文献Bibtex并导入Zotero;自定义无csl文件的Zotero文献导出样式,在任何引用格式需求下实现随写随引;(已被Zotero6.0Beta实现)将所需知网文献批量、自动化导入Zotero。☆12Sep 26, 2024Updated last year
- RankNet, LambdaRank, LambdaMART, GBrank☆14Nov 16, 2013Updated 12 years ago
- ansj_parsing 依存文法&句法分析☆19Jun 27, 2017Updated 9 years ago
- 关系抽取个人实战总结以及开源工具包使用☆55Dec 5, 2018Updated 7 years ago
- 使用tf-idf, TextRank4ZH等不同方式从中文文本中提取关键字,从中文文本中提取摘要和关键词☆34Dec 12, 2018Updated 7 years ago
- 互联网新闻情感分析赛题baseline☆42Sep 18, 2019Updated 6 years ago
- 本项目主要研究大模型在单独的法律数据集上的效果,现在支持belle和chatglm相关的模型训练,预测,验证和在线部署, 另外增加爬虫代码,langchain,结合数据库预测等功能。☆12Jul 16, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- 复现了论文《基于主题模型的短文本关键词抽取及扩展》的代码☆31Nov 11, 2020Updated 5 years ago
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 4 years ago
- Chinese version code for the paper "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks"☆11Jul 25, 2019Updated 6 years ago
- 多标签文本分类,多标签分类,文本分类, multi-label, classifier, text classification, BERT, seq2seq,attention, multi-label-classification☆805Dec 11, 2024Updated last year
- TextClf :基于Pytorch/Sklearn的文本分类框架,包括逻辑回归、SVM、TextCNN、TextRNN、TextRCNN、DRNN、DPCNN、Bert等多种模型,通过简单配置即可完成数据处理、模型训练、测试等过程。☆246Jul 21, 2023Updated 2 years ago
- 京东评论情感分析模型,主要包括1、数据获取及探索性分析;2、文本预处理、文本分词、文本向量化、特征提取、☆84Jun 4, 2019Updated 7 years ago
- 【今日头条】文本作者身份识别比赛☆10Aug 20, 2018Updated 7 years ago