文本分类是指在给定分类体系下 , 根据文本的内容自动确定文本类别的过程。首先我们根据scrapy爬虫根据中国知网URL的规律,爬取70多万条2014年公开的发明专利,然后通过数据清洗筛选出了60多万条含标签数据。通过TF-IDF对60多万条本文进行词频提取,依照词频排序提取前3000个词语形成语义词典,然后根据观察设置停用词。然后再用TF-IDF的方式对每个摘要进行词频选取,通过布尔模型,对比语义词典生成文本向量。然后对标签进行数字化转换。取90%的文本为训练集,10%的文本为测试集。用有监督学习的SVM算法对文本进行分类,(人类生活必需品、作业运输、化学冶金、纺织造纸、固定建筑物、机械工程、物理学、电学)分成8类
☆108Mar 14, 2018Updated 8 years ago
Alternatives and similar repositories for CNKI_Patent_SVM
Users that are interested in CNKI_Patent_SVM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 使用LDA+SVM进行文本的分类☆22Jul 23, 2017Updated 8 years ago
- Text Classification using Bag of Words and TF-IDF models with K-Nearest Neighbor Algorithm☆11Aug 2, 2017Updated 8 years ago
- 利用支持向量机实现中文文本分类☆29May 28, 2018Updated 7 years ago
- SVM中文文本分类☆13Mar 13, 2022Updated 4 years ago
- 📃您身边的AI法律顾问(比赛项目)☆24Feb 26, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。☆110Dec 24, 2018Updated 7 years ago
- 一个自然语言处理的可视化系统,实现自动生成词云图、文章关键信息提取、多文档主题分布、文本分类等功能,还有一些业务数据的可视化图表展示。☆38Jan 27, 2021Updated 5 years ago
- 多标签文本分类☆53Jun 8, 2019Updated 6 years ago
- a spider for cnki patent content, just for study and commucation, no use for business.☆123Dec 21, 2017Updated 8 years ago
- 上市公司年报分析☆12Jul 16, 2019Updated 6 years ago
- 毕业论文代码 + 评论文本 数据获取+数据清洗+文本数据向量化+将数据放进分类器(KNN+Naive Bayes+SVM)中训练+结果评估☆55May 17, 2022Updated 3 years ago
- “达观杯”长文本智能处理挑战赛。达观数据提供了一批长文本数据和分类信息,希望选手动用自己的智慧,结合当下最先进的NLP和人工智能技术,深入分析文本内在结构和语义信息,构建文本分类模型,实现精准分类。☆10Jul 20, 2018Updated 7 years ago
- Qimen表示的是奇门遁甲之术,用于抽取各种实体的工具。☆29Jan 12, 2020Updated 6 years ago
- The crawler for data on web of science, especially focus on the analysis of citation data☆16Dec 14, 2018Updated 7 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- pyspark+Word2Vec+Tfidf+LSH、文章相似性推荐☆26Mar 5, 2020Updated 6 years ago
- 复审委无效决定、复审决定Python爬取☆16Mar 5, 2019Updated 7 years ago
- Implementation of Deep Dirichlet Multinomial Regression in python + cython.☆16Mar 7, 2018Updated 8 years ago
- 使用keras框架Embedding+LSTM对短文本分类-半监督☆16Nov 13, 2017Updated 8 years ago
- Package to parse and analyze trademark data from the United States Patent and Trademark Office☆14Apr 5, 2017Updated 9 years ago
- 文本分类之特征选择☆11Aug 12, 2017Updated 8 years ago
- 人脸年龄识别☆24Apr 26, 2019Updated 6 years ago
- This project is simply for the purpose of education. The primary objective of this project is to build a deep-learning model to identify …☆10Apr 1, 2021Updated 5 years ago
- 基于Baidu AI成熟的人脸检测和人体分析接口实现的一套坐姿识别工具☆12May 20, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 对收集的法律文档进行一系列分析,包括根据规范自动切分、案件相似度计算、案件聚类、法律条文推荐等(试验目前基于婚姻类案件,可扩展至其它领域)。☆203Mar 21, 2017Updated 9 years ago
- ☆17Dec 16, 2015Updated 10 years ago
- Chinese word segmentation algorithm based on entropy(基于熵,无需语料库的中文分词)☆11Feb 27, 2018Updated 8 years ago
- Application for processing Chinese text : Sentiment , Keywords , Abstract☆10Apr 13, 2017Updated 8 years ago
- 使用scik-learn 实现k-means,KNN,SVM,贝叶斯,topic_extraction等算法,同时评估分类的准确率,召回率和F值。语料库是中文文本☆43Jul 23, 2017Updated 8 years ago
- 安坐sity ——基于视觉识别的坐姿矫正☆12Apr 26, 2021Updated 4 years ago
- RankNet, LambdaRank, LambdaMART, GBrank☆14Nov 16, 2013Updated 12 years ago
- 关系抽取个人实战总结以及开源工具包使用☆55Dec 5, 2018Updated 7 years ago
- 使用tf-idf, TextRank4ZH等不同方式从中文文本中提取关键字,从中文文本中提取摘要和关键词☆34Dec 12, 2018Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 互联网新闻情感分析赛题baseline☆42Sep 18, 2019Updated 6 years ago
- 本项目主要研究大模型在单独的法律数据集上的效果,现在支持belle和chatglm相关的模型训练,预测,验证和在线部署, 另外增加爬虫代码,langchain,结合数据库预测等功能。☆12Jul 16, 2023Updated 2 years ago
- 复现了论文《基于主题模型的短文本关键词抽取及扩展》的代码☆31Nov 11, 2020Updated 5 years ago
- 使用2018年度部分大众点评的用户评价作为数据集,未筛选前共440万条评论数据,经过数据集的标签化处理以及中文文本的预处理、特征提取以及特征权重后,使用了SVM,朴素贝叶斯,Adabosst等经典机器学习方法进行分类,之后又使用了Bi-LSTM的深度神经网络进行训练分类。☆13Nov 11, 2021Updated 4 years ago
- Chinese version code for the paper "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks"☆11Jul 25, 2019Updated 6 years ago
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 3 years ago
- 多标签文本分类,多标签分类,文本分类, multi-label, classifier, text classification, BERT, seq2seq,attention, multi-label-classification☆804Dec 11, 2024Updated last year