中文 Instruction tuning datasets
☆143Apr 10, 2024Updated 2 years ago
Alternatives and similar repositories for Chinese-instruction-datasets
Users that are interested in Chinese-instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Large-scale exact string matching tool☆17Mar 7, 2025Updated last year
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆14Sep 6, 2024Updated last year
- 基于sentence-transformers实现文本转向量的机器人☆47Aug 22, 2022Updated 3 years ago
- 一个非常高效的字符串匹配工具,支持正向/反向最大匹配分词和多模式字符串精确匹配☆16Jul 29, 2023Updated 2 years ago
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,153Feb 27, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆36Sep 6, 2024Updated last year
- 🦩 Official repository of paper "Visual Instruction Tuning with Polite Flamingo" (AAAI-24 Oral)☆65Dec 9, 2023Updated 2 years ago
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,279Oct 16, 2024Updated last year
- ChatGLM-6B 指令学习|指令数据|Instruct☆652Apr 10, 2023Updated 3 years ago
- Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、…☆6,643Oct 24, 2024Updated last year
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆68Mar 27, 2023Updated 3 years ago
- ☆129May 27, 2023Updated 2 years ago
- 中文自然语言推理与语义相似度数据集☆366Jan 5, 2022Updated 4 years ago
- The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型☆121Apr 12, 2026Updated 3 weeks ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 基于rasa_框架实现指自然语言相关功能:实体识别、文本分类、代消解功能、关系抽取等☆17May 22, 2023Updated 2 years ago
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。☆1,034Oct 19, 2023Updated 2 years ago
- alpaca中文指令微调数据集☆396Mar 26, 2023Updated 3 years ago
- [COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集☆667Jun 19, 2023Updated 2 years ago
- Ziya-LLaMA-13B是IDEA基于LLaMa的130亿参数的大规模预训练模型,具备翻译,编程,文本分类,信息抽取,摘要,文案生成,常识问答和数学计算等能力。目前姜子牙通用大模型已完成大规模预训练、多任务有监督微调和人类反馈学习三阶段的训练过程。本文主要用于Ziya-…☆47Jun 9, 2023Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,050Apr 14, 2024Updated 2 years ago
- ☆25Aug 1, 2024Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆285Aug 20, 2023Updated 2 years ago
- 中文文本纠错数据集汇总☆39Mar 24, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A dataset used for NLP tasks.☆10Apr 17, 2021Updated 5 years ago
- Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.☆1,015Apr 27, 2024Updated 2 years ago
- (1)弹性区间标准化的旋转位置词嵌入编码器+peft LORA量化训练,提高万级tokens性能支持。(2)证据理论解释学习,提升模型的复杂逻辑推理能力(3)兼容alpaca数据格式。☆44Jul 19, 2023Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- NEZHA: Neural Contextualized Representation for Chinese Language Understanding☆259Aug 13, 2021Updated 4 years ago
- LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具,支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面,使用户能够轻松设置和启动训练流程,无需编写复杂代码。☆13Jul 18, 2025Updated 9 months ago
- ☆25Nov 28, 2024Updated last year
- A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.☆392Dec 12, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- TGLS: Unsupervised Text Generation by Learning from Search☆25Jan 5, 2021Updated 5 years ago
- Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合☆5,556May 3, 2026Updated last week
- ☆43Mar 6, 2025Updated last year
- code and data for "CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers"☆83Aug 18, 2024Updated last year
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆40May 31, 2025Updated 11 months ago
- ☆38Mar 6, 2026Updated 2 months ago
- 中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard☆4,257Feb 6, 2026Updated 3 months ago