中文 Instruction tuning datasets
☆143Apr 10, 2024Updated 2 years ago
Alternatives and similar repositories for Chinese-instruction-datasets
Users that are interested in Chinese-instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Large-scale exact string matching tool☆17Mar 7, 2025Updated last year
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆14Sep 6, 2024Updated last year
- 基于sentence-transformers实现文本转向量的机器人☆47Aug 22, 2022Updated 3 years ago
- 一个非常高效的字符串匹配工具,支持正向/反向最大匹配分词和多模式字符串精确匹配☆16Jul 29, 2023Updated 2 years ago
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,148Feb 27, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆36Sep 6, 2024Updated last year
- 🦩 Official repository of paper "Visual Instruction Tuning with Polite Flamingo" (AAAI-24 Oral)☆65Dec 9, 2023Updated 2 years ago
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,285Oct 16, 2024Updated last year
- ChatGLM-6B 指令学习|指令数据|Instruct☆652Apr 10, 2023Updated 3 years ago
- Firefly: 大 模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、…☆6,649Oct 24, 2024Updated last year
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆68Mar 27, 2023Updated 3 years ago
- ☆128May 27, 2023Updated 2 years ago
- 中文自然语言推理与语义相似度数据集☆366Jan 5, 2022Updated 4 years ago
- The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型☆121Apr 12, 2026Updated last week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- This repo contains the code for our paper "Iterative Edit-Based Unsupervised Sentence Simplification" accepted at ACL 2020.☆14Jul 19, 2021Updated 4 years ago
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆212Jan 9, 2025Updated last year
- 基于rasa_框架实现指自然语言相关功能:实体识别、文本分类、代消解功能、关系抽取等☆17May 22, 2023Updated 2 years ago
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。☆1,035Oct 19, 2023Updated 2 years ago
- alpaca中文指令微调数据集☆396Mar 26, 2023Updated 3 years ago
- [ACL 2024] "Understanding and Patching Compositional Reasoning in LLMs"☆13Aug 28, 2024Updated last year
- [COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集☆664Jun 19, 2023Updated 2 years ago
- Ziya-LLaMA-13B是IDEA基于LLaMa的130亿参数的大规模预训练模型,具备翻译,编程,文本分类,信息抽取,摘要,文案生成,常识问答和数学计算等能力。目前姜子牙通用大模型已完成大规模预训练、多任务有监督微调和人类反馈学习三阶段的训练过程。本文主要用于Ziya-…☆46Jun 9, 2023Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训 练/指令微调数据集☆3,051Apr 14, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆25Aug 1, 2024Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆285Aug 20, 2023Updated 2 years ago
- 中文文本纠错数据集汇总☆36Mar 24, 2026Updated 3 weeks ago
- Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.☆1,017Apr 27, 2024Updated last year
- (1)弹性区间标准化的旋转位置词嵌入编码器+peft LORA量化训练,提高万级tokens性能支持。(2)证据理论解释学习,提升模型的复杂逻辑推理能力(3)兼容alpaca数据格式。☆45Jul 19, 2023Updated 2 years ago
- ☆12Feb 28, 2025Updated last year
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具,支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面,使用户能够轻松设置和启动训练流程,无需编写复杂代码。☆13Jul 18, 2025Updated 9 months ago
- ☆25Nov 28, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.☆392Dec 12, 2023Updated 2 years ago
- TGLS: Unsupervised Text Generation by Learning from Search☆25Jan 5, 2021Updated 5 years ago
- ☆14Feb 24, 2025Updated last year
- Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合☆5,554Apr 12, 2026Updated last week
- ☆43Mar 6, 2025Updated last year
- code and data for "CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers"☆82Aug 18, 2024Updated last year
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆39May 31, 2025Updated 10 months ago