2hip3ng / chinese-text-cleanView external linksLinks
中文文本数据清理,去url,去非中文、英文、数字字符,分词,去停用词,去空行(根据文本需求再加自定义清理)
☆17May 5, 2019Updated 6 years ago
Alternatives and similar repositories for chinese-text-clean
Users that are interested in chinese-text-clean are comparing it to the libraries listed below
Sorting:
- Scraped reviews from OpenRice for sentiment analysis. Formatted to use with BERT.☆11Apr 9, 2020Updated 5 years ago
- Object annotation maker in VOC Pascal format using object images and background images☆10Feb 27, 2021Updated 4 years ago
- Exploring actor critic deep reinforcement learning methods for maximizing profits by learning stock trading strategies☆11Mar 24, 2023Updated 2 years ago
- ☆12Dec 22, 2020Updated 5 years ago
- In this project, we need to find out commercial products listed on Google that refer to the same entity across Amazon by comparing the si…☆11Nov 7, 2016Updated 9 years ago
- Python cffi binding to CppJieba☆15Sep 15, 2020Updated 5 years ago
- Any Stream to Reinforcement Learning Environment (Time Series Data, Stock Market )☆11Oct 10, 2018Updated 7 years ago
- 文本处理相关库,目前包括新词发现、字符串匹配等功能。☆15Jul 6, 2021Updated 4 years ago
- A structured parsing technique for NER☆15May 26, 2023Updated 2 years ago
- Training from scratch a character embedding following Word2Vec, using tensorflow.☆14Mar 24, 2023Updated 2 years ago
- Utils for mapping dataclass fields to dictionary keys, making it possible to create an instance of a dataclass from a dictionary.☆14Jun 22, 2023Updated 2 years ago
- Created an inverted index in Python for document retreival☆13Oct 7, 2018Updated 7 years ago
- Fast graph database in pure Python☆16Aug 31, 2021Updated 4 years ago
- Understanding Word2Vec with Gensim and Elang (Python Packages)☆13Apr 24, 2020Updated 5 years ago
- A library, that provides Conflict Free Replicated Data Types (CRDTs) for distributed Python applications.☆17Jan 10, 2019Updated 7 years ago
- Parsing PDF files with PDFium☆12Nov 7, 2024Updated last year
- Stock investment can be one of the ways to manage one’s asset. Technical analysis is sometimes used in financial markets to assist trader…☆12Sep 30, 2020Updated 5 years ago
- Simple in memory data cache designed for ML applications. Built using Redis and Apache Arrow's Plasma in-memory store☆11Oct 13, 2020Updated 5 years ago
- An intelligent OCR to detect tables and pure text inside PDFs and obtaing a csv file and a txt from it☆15Sep 11, 2018Updated 7 years ago
- 第一個開放的客語斷詞工具☆13Jun 10, 2018Updated 7 years ago
- this project is developing to crawl stock A finance and trade data from website, process finance and trade data to get factors, and then …☆17Jan 12, 2023Updated 3 years ago
- A general graph manipulation python module☆16Jun 2, 2009Updated 16 years ago
- Named Entity Recognition via Attention_based CNNs-BiLSTm-CRF☆15Jun 27, 2018Updated 7 years ago
- BiLSTM+CNN+CRF NER, using pytorch☆16May 26, 2019Updated 6 years ago
- 使用fastNLP架构简单利用Bert-Bi-LSTM-CRF实现中文NER☆15Sep 25, 2020Updated 5 years ago
- Part-of-Speech Tagging Models in Python☆15Oct 7, 2019Updated 6 years ago
- ☆20Jul 22, 2021Updated 4 years ago
- Fine tuning of the Retrieval-Augmented Generation (RAG) with a custom knowledge source.☆13Feb 10, 2021Updated 5 years ago
- ☆19Nov 7, 2018Updated 7 years ago
- An implementation of bidirectional LSTM-CRF for Named Entity Relationship on custom corpus with custom word embeddings☆14Apr 9, 2019Updated 6 years ago
- Convert HTML tables to excel files☆15Jul 3, 2021Updated 4 years ago
- GUI useful to manually annotate text for Named Entity Recognition purposes☆14Jun 22, 2023Updated 2 years ago
- Examples using PyTorch-BigGraph☆17Jun 21, 2019Updated 6 years ago
- ELMO在QA问答,文本分类等NLP上面的应用☆15Apr 13, 2019Updated 6 years ago
- a dictionary-like, file-based cache module for Python☆22Nov 19, 2024Updated last year
- Named-Entity-Recognition Workshop☆16May 27, 2019Updated 6 years ago
- Deep Reinforcement Learning for Stock trading task☆21Feb 9, 2021Updated 5 years ago
- A DeepWalk implementation for ontologies using NetworkX and Gensim☆19May 15, 2017Updated 8 years ago
- This is a Flask + Docker deployment of the PyTorch-based Named Entity Recognition (NER) Model (BiLSTM-CRF) in the Medical AI.☆23Jan 28, 2023Updated 3 years ago