中文 Instruction tuning datasets
☆143Apr 10, 2024Updated 2 years ago
Alternatives and similar repositories for Chinese-instruction-datasets
Users that are interested in Chinese-instruction-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Large-scale exact string matching tool☆17Mar 7, 2025Updated last year
- 一个非常高效的字符串匹配工具,支持正向/反向最大匹配分词和多模式字符串精确匹配☆16Jul 29, 2023Updated 2 years ago
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,176Feb 27, 2024Updated 2 years ago
- ☆36Sep 6, 2024Updated last year
- ChatGLM-6B 指令学习|指令数据|Instruct☆651Apr 10, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、…☆6,641Oct 24, 2024Updated last year
- ☆129May 27, 2023Updated 3 years ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆68Mar 27, 2023Updated 3 years ago
- 中文自然语言推理与语义相似度数据集☆366Jan 5, 2022Updated 4 years ago
- The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型☆121Apr 12, 2026Updated 2 months ago
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆216Jan 9, 2025Updated last year
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。☆1,033Oct 19, 2023Updated 2 years ago
- JDDC 2019 并列亚军(第三名)“网数ICT小分队”的检索模型部分☆51Mar 24, 2023Updated 3 years ago
- alpaca中文指令微调数据集☆395Mar 26, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集☆671Jun 19, 2023Updated 3 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,050Apr 14, 2024Updated 2 years ago
- ☆25Aug 1, 2024Updated last year
- 中文文本纠错数据集汇总☆42Mar 24, 2026Updated 2 months ago
- A dataset used for NLP tasks.☆10Apr 17, 2021Updated 5 years ago
- Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.☆1,014Apr 27, 2024Updated 2 years ago
- (1)弹性区间标准化的旋转位置词嵌入编码器+peft LORA量化训练,提高万级tokens性能支持。(2)证据理论解释学习,提升模型的复杂逻辑推理能力(3)兼容alpaca数据格式。☆44Jul 19, 2023Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- NEZHA: Neural Contextualized Representation for Chinese Language Understanding☆259Aug 13, 2021Updated 4 years ago
- LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具,支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面,使用户能够轻松设置和启动训练流程,无需编写复杂代码。☆13Jul 18, 2025Updated 11 months ago
- A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.☆394Dec 12, 2023Updated 2 years ago
- TGLS: Unsupervised Text Generation by Learning from Search☆25Jan 5, 2021Updated 5 years ago
- Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合☆5,576May 30, 2026Updated 3 weeks ago
- ☆43Mar 6, 2025Updated last year
- code and data for "CSCD-NS: a Chinese Spelling Check Dataset for Native Speakers"☆83Aug 18, 2024Updated last year
- ☆45Mar 6, 2026Updated 3 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆40May 31, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard☆4,266Feb 6, 2026Updated 4 months ago
- ASR, End-to-End, end2end, Speech Recognition, 端到端语音识别☆12Oct 25, 2020Updated 5 years ago
- nCoV related sentence similarity by BERT☆19Mar 18, 2020Updated 6 years ago
- [EMNLP 2022] ReCo: Reliable Causal Chain Reasoning via Structural Causal Recurrent Neural Networks☆17Apr 24, 2024Updated 2 years ago
- 大语言模型训练和服务调研☆37Aug 4, 2023Updated 2 years ago
- 活字通用大模型☆395Sep 12, 2024Updated last year
- ☆164Apr 17, 2023Updated 3 years ago