中文 Instruction tuning datasets
☆143Apr 10, 2024Updated 2 years ago
Alternatives and similar repositories for Chinese-instruction-datasets
Users that are interested in Chinese-instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Large-scale exact string matching tool☆17Mar 7, 2025Updated last year
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆14Sep 6, 2024Updated last year
- 基于sentence-transformers实现文本转向量的机器人☆47Aug 22, 2022Updated 3 years ago
- 一个非常高效的字符串匹配工具,支持正向/反向最大匹配分词和多模式字符串精确匹配☆16Jul 29, 2023Updated 2 years ago
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,165Feb 27, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆36Sep 6, 2024Updated last year
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,276Oct 16, 2024Updated last year
- ChatGLM-6B 指令学习|指令数据|Instruct☆651Apr 10, 2023Updated 3 years ago
- Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、…☆6,643Oct 24, 2024Updated last year
- ☆129May 27, 2023Updated 3 years ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆68Mar 27, 2023Updated 3 years ago
- 中文自然语言推理与语义相似度数据集☆366Jan 5, 2022Updated 4 years ago
- The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型☆121Apr 12, 2026Updated last month
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆213Jan 9, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 基于rasa_框架实现指自然语言相关功能:实体识别、文本分类、代消解功能、关系抽取等☆17May 22, 2023Updated 3 years ago
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。☆1,033Oct 19, 2023Updated 2 years ago
- alpaca中文指令微调数据集☆395Mar 26, 2023Updated 3 years ago
- [ACL 2024] "Understanding and Patching Compositional Reasoning in LLMs"☆14Aug 28, 2024Updated last year
- Ziya-LLaMA-13B是IDEA基于LLaMa的130亿参数的大规模预训练模型,具备翻译,编程,文本分类,信息抽取,摘要,文案生成,常识问答和数学计算等能力。目前姜子牙通用大模型已完成大规模预训练、多任务有监督微调和人类反馈学习三阶段的训练过程。本文主要用于Ziya-…☆46Jun 9, 2023Updated 2 years ago
- [COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集☆668Jun 19, 2023Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,051Apr 14, 2024Updated 2 years ago
- ☆25Aug 1, 2024Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆286Aug 20, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 中文文本纠错数据集汇总☆39Mar 24, 2026Updated 2 months ago
- A dataset used for NLP tasks.☆10Apr 17, 2021Updated 5 years ago
- Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.☆1,014Apr 27, 2024Updated 2 years ago
- (1)弹性区间标准化的旋转位置词嵌入编码器+peft LORA量化训练,提高万级tokens性能支持。(2)证据理论解释学习,提升模型的复杂逻辑推理能力(3)兼容alpaca数据格式。☆44Jul 19, 2023Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具,支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面,使用户能够轻松设置和启动训练流程,无需编写复杂代码。☆13Jul 18, 2025Updated 10 months ago
- NEZHA: Neural Contextualized Representation for Chinese Language Understanding☆259Aug 13, 2021Updated 4 years ago
- ☆25Nov 28, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.☆393Dec 12, 2023Updated 2 years ago
- TGLS: Unsupervised Text Generation by Learning from Search☆25Jan 5, 2021Updated 5 years ago
- ☆14Feb 24, 2025Updated last year
- Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合☆5,564May 15, 2026Updated 2 weeks ago
- ☆43Mar 6, 2025Updated last year
- code and data for "CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers"☆83Aug 18, 2024Updated last year
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆40May 31, 2025Updated last year