A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
☆36Oct 18, 2022Updated 3 years ago
Alternatives and similar repositories for Takin
Users that are interested in Takin are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- simple translate☆12Mar 7, 2020Updated 6 years ago
- Simple Transformers四种任务(分类、命名实体识别、机器阅读理解、语言模型微调)的代码样例,可以切换多种预训练模型。☆23Jun 7, 2022Updated 4 years ago
- Source code and dataset for the paper "GECOR: An End-to-End Generative Ellipsis and Co-reference Resolution Model for Task-Oriented Dialo…☆30Jul 22, 2023Updated 2 years ago
- TXT文本语料数据清洗(Text corpus data cleaning):1> 合并TXT文件;2> 过滤干扰字符串;3> 对人名、地名、组织机构进行遮码处理;4> 将其他编码格式统一转换为UTF-8☆19Oct 14, 2022Updated 3 years ago
- 开源QG系统(Question Generation,问题生成),基于Pytorch和Transformer编写☆55Jul 25, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Extract Chinese/English QA Data from WikiHow pages.☆17May 21, 2023Updated 3 years ago
- 基于中文 GPT2 预训练模型的语句困惑度计算☆15Apr 20, 2023Updated 3 years ago
- 中文文本数据清理,去url,去非中文、英文、数字字符,分词,去停用词,去空行(根据文本需求再加自定义清理)☆17May 5, 2019Updated 7 years ago
- The QA datasets used for DrQA evaluation.☆14Nov 30, 2018Updated 7 years ago
- 根据维基百科历史编辑数据提取纠错语料。☆12Apr 6, 2022Updated 4 years ago
- code for ACL 2019 paper "cross lingual training for automatic question generation"☆14Jun 30, 2019Updated 6 years ago
- 基于豆瓣电影打分的评论文本分类,使用tf-idf/word2vec/bert方法构造词向量,利用svm和逻辑回归模型进行分类☆18Jan 8, 2022Updated 4 years ago
- Official Implementation of AAAI 2025 paper "MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symb…☆49Dec 8, 2025Updated 6 months ago
- Analysis codes for Laser-Induced Breakdown Spectroscopy data☆10Aug 19, 2017Updated 8 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Advancing Spatial-Temporal Rock Fracture Prediction with Virtual Camera-Based Data Augmentation☆12Jan 19, 2025Updated last year
- CamRest676 is an English data set, I translate it into Chinese for training nlu.☆12Dec 20, 2017Updated 8 years ago
- implement a RNN model of DSTC2 task☆16Jan 25, 2019Updated 7 years ago
- 涵盖网络爬虫、数据库、数据分析、机器学习、可视化、文本分析、GUI、自动化办公☆14Jan 14, 2022Updated 4 years ago
- 💬 简单在线聊天室。☆11Apr 15, 2019Updated 7 years ago
- 基于知识图谱的科技查新数据分析系统针对科技报告和文献等数据进行处理和分析。 用到知识图谱/TextCNN文本分类等技术,前后端分别采用vue和springboot,数据库 采用MySQL和neo4j,结合echarts图表对分析结果进行展示。☆14Mar 28, 2025Updated last year
- Examples about using MGeo finetune models☆57Feb 9, 2023Updated 3 years ago
- LLM evaluation on 2024 Chinese Gaokao Mathematics — zero-contamination benchmark with dual prompt formats☆21Apr 15, 2026Updated last month
- This repository contains code and models for the paper: Semantic Graphs for Generating Deep Questions (ACL 2020).☆65Jan 20, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Stochastic Answer Networks (SAN) for Machine Reading Comprehension☆149Nov 26, 2018Updated 7 years ago
- DocQues answers queries on longer and multiple documents build on GPT-Index and GPT-3☆13Jan 1, 2023Updated 3 years ago
- A demonstration of how to train a custom tokenizer similar to TikToken.