中文文本数据清理,去url,去非中文、英文、数字字符,分词,去停用词,去空行(根据文本需求再加自定义清理)
☆17May 5, 2019Updated 6 years ago
Alternatives and similar repositories for chinese-text-clean
Users that are interested in chinese-text-clean are comparing it to the libraries listed below
Sorting:
- Spark—Python学习笔记☆11Sep 25, 2018Updated 7 years ago
- Recent papers on Graph Neural Networks-based Recommender System.☆12Aug 21, 2023Updated 2 years ago
- 中文语料:大量人工标注样本,非常有价值 !!!☆11Aug 15, 2019Updated 6 years ago
- 利用小程序本地存储封装的激励视频版积分系统☆11Jun 19, 2019Updated 6 years ago
- 爬虫, 反爬虫, JS 逆向, 安卓逆向, AST☆12Sep 14, 2020Updated 5 years ago
- Object annotation maker in VOC Pascal format using object images and background images☆10Feb 27, 2021Updated 5 years ago
- Scraped reviews from OpenRice for sentiment analysis. Formatted to use with BERT.☆11Apr 9, 2020Updated 5 years ago
- Python cffi binding to CppJieba☆15Sep 15, 2020Updated 5 years ago
- A span-based joint named entity recognition (NER) and relation extraction model.☆11Aug 5, 2020Updated 5 years ago
- The source code of "Deep attention diffusion graph neural networks for text classification"☆13Nov 11, 2023Updated 2 years ago
- In this project, we need to find out commercial products listed on Google that refer to the same entity across Amazon by comparing the si…☆11Nov 7, 2016Updated 9 years ago
- Any Stream to Reinforcement Learning Environment (Time Series Data, Stock Market )☆11Oct 10, 2018Updated 7 years ago
- ☆12Dec 22, 2020Updated 5 years ago
- 文本处理相关库,目前包括新词发现、字符串匹配等功能。☆15Jul 6, 2021Updated 4 years ago
- MCP agent/client/server implementation for private knowledge base☆22May 19, 2025Updated 9 months ago
- Fast graph database in pure Python☆16Aug 31, 2021Updated 4 years ago
- A structured parsing technique for NER☆15May 26, 2023Updated 2 years ago
- Created an inverted index in Python for document retreival☆13Oct 7, 2018Updated 7 years ago
- Parsing PDF files with PDFium☆12Nov 7, 2024Updated last year
- pdf2xml from http://sourceforge.net/projects/pdf2xml/☆16Mar 6, 2013Updated 13 years ago
- Utils for mapping dataclass fields to dictionary keys, making it possible to create an instance of a dataclass from a dictionary.☆14Jun 22, 2023Updated 2 years ago
- Training from scratch a character embedding following Word2Vec, using tensorflow.☆14Mar 24, 2023Updated 2 years ago
- Stock investment can be one of the ways to manage one’s asset. Technical analysis is sometimes used in financial markets to assist trader…☆12Sep 30, 2020Updated 5 years ago
- Simple in memory data cache designed for ML applications. Built using Redis and Apache Arrow's Plasma in-memory store☆11Oct 13, 2020Updated 5 years ago
- An intelligent OCR to detect tables and pure text inside PDFs and obtaing a csv file and a txt from it☆15Sep 11, 2018Updated 7 years ago
- Large Language Models in Molecular Embeddings☆12May 1, 2024Updated last year
- 第一個開放的客語斷詞工具☆13Jun 10, 2018Updated 7 years ago
- A library, that provides Conflict Free Replicated Data Types (CRDTs) for distributed Python applications.☆17Jan 10, 2019Updated 7 years ago
- Understanding Word2Vec with Gensim and Elang (Python Packages)☆13Apr 24, 2020Updated 5 years ago
- ☆15Feb 5, 2019Updated 7 years ago
- Named Entity Recognition via Attention_based CNNs-BiLSTm-CRF☆15Jun 27, 2018Updated 7 years ago
- ☆20Jul 22, 2021Updated 4 years ago
- 使用fastNLP架构简单利用Bert-Bi-LSTM-CRF实现中文NER☆15Sep 25, 2020Updated 5 years ago
- BiLSTM+CNN+CRF NER, using pytorch☆16May 26, 2019Updated 6 years ago
- Fine tuning of the Retrieval-Augmented Generation (RAG) with a custom knowledge source.☆13Feb 10, 2021Updated 5 years ago
- An implementation of bidirectional LSTM-CRF for Named Entity Relationship on custom corpus with custom word embeddings☆14Apr 9, 2019Updated 6 years ago
- A python open-source distributed in-memory cache and database.☆21Jul 30, 2020Updated 5 years ago
- Part-of-Speech Tagging Models in Python☆16Oct 7, 2019Updated 6 years ago
- ELMO在QA问答,文本分类等NLP上面的应用☆15Apr 13, 2019Updated 6 years ago