中文 Instruction tuning datasets
☆143Apr 10, 2024Updated last year
Alternatives and similar repositories for Chinese-instruction-datasets
Users that are interested in Chinese-instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Large-scale exact string matching tool☆17Mar 7, 2025Updated last year
- 使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调,旨在: 验证生成式方法相较于抽取式NER的效果; 为新手提供简易的模型微调流程,尽量减少代码量; 大模型训练的数据格式处理。☆15Sep 6, 2024Updated last year
- 一个非常高效的字符串匹配工具,支持正向/反向最大匹配分词和多模式字符串精确匹配☆16Jul 29, 2023Updated 2 years ago
- 基于sentence-transformers实现文本转向量的机器人☆47Aug 22, 2022Updated 3 years ago
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,142Feb 27, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆36Sep 6, 2024Updated last year
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,287Oct 16, 2024Updated last year
- ChatGLM-6B 指令学习|指令数据|Instruct☆653Apr 10, 2023Updated 2 years ago
- Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、…☆6,652Oct 24, 2024Updated last year
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆68Mar 27, 2023Updated 3 years ago
- ☆128May 27, 2023Updated 2 years ago
- 中文自然语言推理与语义相似度数据集☆366Jan 5, 2022Updated 4 years ago
- The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型☆120Dec 10, 2024Updated last year
- This repo contains the code for our paper "Iterative Edit-Based Unsupervised Sentence Simplification" accepted at ACL 2020.☆14Jul 19, 2021Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆212Jan 9, 2025Updated last year
- 基于rasa_框架实现指自然语言相关功能:实体识别、文本分类、代消解功能、关系抽取等☆17May 22, 2023Updated 2 years ago
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。☆1,036Oct 19, 2023Updated 2 years ago
- alpaca中文指令微调数据集☆396Mar 26, 2023Updated 3 years ago
- Ziya-LLaMA-13B是IDEA基于LLaMa的130亿参数的大规模预训练模型,具备翻译,编程,文本分类,信息抽取,摘要,文案生成,常识问答和数学计算等能力。目前姜子牙通用大模型已完成大规模预训练、多任务有监督微调和人类反馈学习三阶段的训练过程。本文主要用于Ziya-…☆46Jun 9, 2023Updated 2 years ago
- [COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中 文科学文献数据集☆663Jun 19, 2023Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,054Apr 14, 2024Updated last year
- ☆25Aug 1, 2024Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆285Aug 20, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- 中文文本纠错数据集汇总☆35Mar 24, 2026Updated last week
- A dataset used for NLP tasks.☆10Apr 17, 2021Updated 4 years ago
- Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.☆1,019Apr 27, 2024Updated last year
- (1)弹性区间标准化的旋转位置词嵌入编码器+peft LORA量化训练,提高万级tokens性能支持。(2)证据理论解释学习,提升模型的复杂逻辑推理能力(3)兼容alpaca数据格式。☆45Jul 19, 2023Updated 2 years ago
- ☆12Feb 28, 2025Updated last year
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- NEZHA: Neural Contextualized Representation for Chinese Language Understanding☆259Aug 13, 2021Updated 4 years ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具,支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面,使用户能够轻松设置和启动训练流程,无需编写复杂代码。☆13Jul 18, 2025Updated 8 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.☆390Dec 12, 2023Updated 2 years ago
- ☆24Nov 28, 2024Updated last year
- TGLS: Unsupervised Text Generation by Learning from Search☆25Jan 5, 2021Updated 5 years ago
- ☆14Feb 24, 2025Updated last year
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆38May 31, 2025Updated 10 months ago
- Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合☆5,543Mar 22, 2026Updated last week
- code and data for "CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers"☆82Aug 18, 2024Updated last year