A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
☆36Oct 18, 2022Updated 3 years ago
Alternatives and similar repositories for Takin
Users that are interested in Takin are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository contains datasets (including testing set) for EMNLP-IJCNLP 2019 paper "BiPaR: A Bilingual Parallel Dataset for Multilingu…☆23Jul 13, 2021Updated 4 years ago
- Simple Transformers四种任务(分类、命名实体识别、机器阅读理解、语言模型微调)的代码样例,可以切换多种预训练模型。☆23Jun 7, 2022Updated 4 years ago
- LaTeX Thesis Template for Soochow University☆10Mar 28, 2021Updated 5 years ago
- TXT文本语料数据清洗(Text corpus data cleaning):1> 合并TXT文件;2> 过滤干扰字符串;3> 对人名、地名、组织机构进行遮码处理;4> 将其他编码格式统一转换为UTF-8☆19Oct 14, 2022Updated 3 years ago
- MNBVC项目-ShareGPT语料清洗☆16Oct 4, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 开源QG系统(Question Generation,问题生成),基于Pytorch和Transformer编写☆55Jul 25, 2024Updated last year
- Extract Chinese/English QA Data from WikiHow pages.☆17May 21, 2023Updated 3 years ago
- 基于中文 GPT2 预训练模型的语句困惑度计算☆15Apr 20, 2023Updated 3 years ago
- 中文文本数据清理,去url,去非中文、英文、数字字符,分词,去停用词,去空行(根据文本需求再加自定义清理)☆17May 5, 2019Updated 7 years ago
- NLP预/后处理工具。☆30Mar 31, 2025Updated last year
- 根据维基百科历史编辑数据提取纠错语料。☆12Apr 6, 2022Updated 4 years ago
- Usings LLM chat with knowledges☆21Aug 12, 2023Updated 2 years ago
- code for ACL 2019 paper "cross lingual training for automatic question generation"☆14Jun 30, 2019Updated 7 years ago
- 基于豆瓣电影打分的评论文本分类,使用tf-idf/word2vec/bert方法构造词向量,利用svm和逻辑回归模型进行分类☆18Jan 8, 2022Updated 4 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Analysis codes for Laser-Induced Breakdown Spectroscopy data☆10Aug 19, 2017Updated 8 years ago
- Advancing Spatial-Temporal Rock Fracture Prediction with Virtual Camera-Based Data Augmentation☆13Jan 19, 2025Updated last year
- 豆瓣爬虫|知乎爬虫|马蜂窝|猫途鹰|推特等相关爬虫☆24Dec 13, 2017Updated 8 years ago
- CamRest676 is an English data set, I translate it into Chinese for training nlu.☆12Dec 20, 2017Updated 8 years ago
- The wizard of oz code used for collecting goal-oriented dialogue systems☆13Oct 30, 2017Updated 8 years ago
- implement a RNN model of DSTC2 task☆16Jan 25, 2019Updated 7 years ago
- Neural Paraphrase Generation based on OpenNMT-py☆12Jan 2, 2018Updated 8 years ago
- LLM evaluation on 2024 Chinese Gaokao Mathematics — zero-contamination benchmark with dual prompt formats☆21Apr 15, 2026Updated 2 months ago
- This repository contains code and models for the paper: Semantic Graphs for Generating Deep Questions (ACL 2020).☆65Jan 20, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Stochastic Answer Networks (SAN) for Machine Reading Comprehension☆149Nov 26, 2018Updated 7 years ago
- An easy-to-use sequence labeling project(get SoA on ATIS data) with pytorch☆15Nov 21, 2018Updated 7 years ago
- 深度学习和NLP随笔☆27Jun 17, 2019Updated 7 years ago
- 以京东评论作为数据集,使用常见的机器学习算法如KNN、SVM、逻辑回归、贝叶斯、xgboost等等算法进行分类。使用深度学习中的CNN、RNN、CNN和RNN连接、Bi-GRU、bert模型进行分类。使用fastnlp的框架搭建文本分类。☆31Jul 2, 2020Updated 6 years ago
- 中山大学自然语言处理项目:中文分词(序列标注/命名实体识别)。Keras实现,BiLSTM+CRF框架。☆18Jan 30, 2021Updated 5 years ago
- 飞桨常规赛:中文新闻文本标题分类9月第1名方案,分数0.9+,基于PaddleNLP通过预训练模型的微调完成新闻14分类模型的训练与优化☆19Oct 15, 2021Updated 4 years ago
- A Large-Scale Dataset for Long Text and Multi-Table Summarization☆18Feb 21, 2024Updated 2 years ago
- Batch processor to enable large content be digested by Ollama, focused around book processing and translations by default, fully, configu…☆36Oct 27, 2025Updated 8 months ago
- Toward Scalable Neural Dialogue State Tracking Model☆20Sep 23, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A simple MCP ODBC server using FastAPI, ODBC and SQLAlchemy.☆24May 23, 2025Updated last year
- sailVina用于Linux的反向对接脚本☆10Feb 14, 2021Updated 5 years ago
- ☆40Jan 3, 2023Updated 3 years ago
- Materials for AACL-IJCNLP-2022 tutorial: Efficient and Robust Knowledge Graph Construction☆28Feb 3, 2023Updated 3 years ago
- The source code of the paper 'Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation'☆24Mar 24, 2023Updated 3 years ago
- Sparse Multilabel Categorical Crossentropy☆11Sep 10, 2023Updated 2 years ago
- Analyzing knowledge graph embedding methods, including TransE, DistMult, CP, SimplE, ComplEx, Quaternion☆28May 23, 2023Updated 3 years ago