xv44586/Chinese-instruction-datasets

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xv44586/Chinese-instruction-datasets)

xv44586 / Chinese-instruction-datasets

中文 Instruction tuning datasets

☆143

Alternatives and similar repositories for Chinese-instruction-datasets

Users that are interested in Chinese-instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zejunwang1 / fastMatch
View on GitHub
Large-scale exact string matching tool
☆17Mar 7, 2025Updated last year
tianchiguaixia / qwen1.5-ner
View on GitHub
使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调，旨在：验证生成式方法相较于抽取式NER的效果；为新手提供简易的模型微调流程，尽量减少代码量；大模型训练的数据格式处理。
☆14Sep 6, 2024Updated last year
yuanzhoulvpi2017 / questionAnswerSystem
View on GitHub
基于sentence-transformers实现文本转向量的机器人
☆47Aug 22, 2022Updated 3 years ago
zejunwang1 / darmatch
View on GitHub
一个非常高效的字符串匹配工具，支持正向/反向最大匹配分词和多模式字符串精确匹配
☆16Jul 29, 2023Updated 2 years ago
Zheng0428 / COIG-Kun
View on GitHub
☆36Sep 6, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
LianjiaTech / BELLE
View on GitHub
BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）
☆8,273Oct 16, 2024Updated last year
yanqiangmiffy / InstructGLM
View on GitHub
ChatGLM-6B 指令学习|指令数据|Instruct
☆651Apr 10, 2023Updated 3 years ago
yangjianxin1 / Firefly
View on GitHub
Firefly: 大模型训练工具，支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、…
☆6,647Oct 24, 2024Updated last year
BAAI-Zlab / COIG
View on GitHub
☆128May 27, 2023Updated 3 years ago
beichao1314 / Open-Llama
View on GitHub
The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
☆68Mar 27, 2023Updated 3 years ago
zejunwang1 / CSTS
View on GitHub
中文自然语言推理与语义相似度数据集
☆366Jan 5, 2022Updated 4 years ago
xlxwalex / FCGEC
View on GitHub
The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型
☆121Apr 12, 2026Updated 2 months ago
zjunlp / IEPile
View on GitHub
[ACL 2024] IEPile: A Large-Scale Information Extraction Corpus
☆215Jan 9, 2025Updated last year
learnerzhang / rasa_usage
View on GitHub
基于rasa_框架实现指自然语言相关功能:实体识别、文本分类、代消解功能、关系抽取等
☆17May 22, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
dandelionsllm / pandallm
View on GitHub
Panda项目是于2023年5月启动的开源海外中文大语言模型项目，致力于大模型时代探索整个技术栈，旨在推动中文自然语言处理领域的创新和合作。
☆1,032Oct 19, 2023Updated 2 years ago
carbonz0 / alpaca-chinese-dataset
View on GitHub
alpaca中文指令微调数据集
☆395Mar 26, 2023Updated 3 years ago
ydli-ai / CSL
View on GitHub
[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集
☆673Jun 19, 2023Updated 3 years ago
ChaosWang666 / Ziya-LLaMA-13B-deployment
View on GitHub
Ziya-LLaMA-13B是IDEA基于LLaMa的130亿参数的大规模预训练模型，具备翻译，编程，文本分类，信息抽取，摘要，文案生成，常识问答和数学计算等能力。目前姜子牙通用大模型已完成大规模预训练、多任务有监督微调和人类反馈学习三阶段的训练过程。本文主要用于Ziya-…
☆46Jun 9, 2023Updated 3 years ago
CVI-SZU / Linly
View on GitHub
Chinese-LLaMA 1&2、Chinese-Falcon 基础模型；ChatFlow中文对话模型；中文OpenLLaMA模型；NLP预训练/指令微调数据集
☆3,046Apr 14, 2024Updated 2 years ago
Wusiwei0410 / SciMMIR
View on GitHub
☆25Aug 1, 2024Updated last year
OFA-Sys / InsTag
View on GitHub
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆287Aug 20, 2023Updated 2 years ago
zejunwang1 / CTCDataset
View on GitHub
中文文本纠错数据集汇总
☆44Mar 24, 2026Updated 3 months ago
beeevita / Classical-Chinese-NER-RE-Dataset
View on GitHub
A dataset used for NLP tasks.
☆10Apr 17, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
beyondguo / LLM-Tuning
View on GitHub
Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.
☆1,013Apr 27, 2024Updated 2 years ago
lilongxian / BaiYang-chatGLM2-6B
View on GitHub
（1）弹性区间标准化的旋转位置词嵌入编码器+peft LORA量化训练，提高万级tokens性能支持。（2）证据理论解释学习，提升模型的复杂逻辑推理能力（3）兼容alpaca数据格式。
☆44Jul 19, 2023Updated 2 years ago
ssbuild / aigc_evals
View on GitHub
aigc evals
☆10Dec 2, 2023Updated 2 years ago
chenpipi0807 / LTX-Video-Trainer-GUI
View on GitHub
LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具，支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面，使用户能够轻松设置和启动训练流程，无需编写复杂代码。
☆13Jul 18, 2025Updated 11 months ago
lonePatient / NeZha_Chinese_PyTorch
View on GitHub
NEZHA: Neural Contextualized Representation for Chinese Language Understanding
☆258Aug 13, 2021Updated 4 years ago
edzq / SciER
View on GitHub
☆27Nov 28, 2024Updated last year
SupritYoung / Zhongjing
View on GitHub
A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.
☆396Dec 12, 2023Updated 2 years ago
jingjingli01 / TGLS
View on GitHub
TGLS: Unsupervised Text Generation by Learning from Search
☆25Jan 5, 2021Updated 5 years ago
Dakingrai / neuron-analysis-cot-arithmetic-reasoning
View on GitHub
☆14Feb 24, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lonePatient / awesome-pretrained-chinese-nlp-models
View on GitHub
Awesome Pretrained Chinese NLP Models，高质量中文预训练模型&大模型&多模态模型&大语言模型集合
☆5,574Jun 19, 2026Updated 3 weeks ago
dhcode-cpp / grpo-loss
View on GitHub
☆44Mar 6, 2025Updated last year
nghuyong / cscd-ns
View on GitHub
code and data for "CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers"
☆83Aug 18, 2024Updated last year
RUCAIBox / SWE-World
View on GitHub
☆47Mar 6, 2026Updated 4 months ago
RUC-GSAI / Llama-3-SynE
View on GitHub
Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …
☆40May 31, 2025Updated last year
CLUEbenchmark / CLUE
View on GitHub
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
☆4,267Feb 6, 2026Updated 5 months ago
lhwcv / nCoV_sentence_simi
View on GitHub
nCoV related sentence similarity by BERT
☆19Mar 18, 2020Updated 6 years ago