YouTaoBaBa / Chinese-Dialogue-Dataset
用于汇总目前的开源中文对话数据集
☆94Updated last year
Related projects: ⓘ
- 多轮共情对话模型PICA☆83Updated last year
- A wide variety of research projects developed by the SpokenNLP team of Speech Lab, Alibaba Group.☆101Updated 7 months ago
- 文本去重☆65Updated 3 months ago
- ☆171Updated 7 months ago
- ☆76Updated 4 months ago
- CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI | 中文个性情感对话数据集☆201Updated last year
- [LREC] MMChat: Multi-Modal Chat Dataset on Social Media☆96Updated last year
- flow mirror models from JZX AI Labs☆33Updated this week
- 从小说中提取对话数据集☆78Updated 3 months ago
- 阿里通义千问(Qwen-7B-Chat/Qwen-7B), 微调/LORA/推理☆63Updated 4 months ago
- ☆158Updated 3 months ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆58Updated last year
- A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.☆291Updated 9 months ago
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆240Updated last year
- A framework for cleaning Chinese dialog data☆259Updated 3 years ago
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆198Updated 9 months ago
- 使用sentencepiece中BPE训练中文词表,并在transformers中进行使用。☆107Updated last year
- ☆23Updated last year
- 大语言模型指令调优工具(支持 FlashAttention)☆162Updated 8 months ago
- 更纯粹、更高压缩率的Tokenizer☆438Updated 5 months ago
- 中文 Instruction tuning datasets☆112Updated 5 months ago
- Baichuan2代码的逐行解析版本,适合小白☆208Updated last year
- 中文对话数据清洗☆20Updated last year
- 本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作☆52Updated 2 weeks ago
- llama inference for tencentpretrain☆95Updated last year
- 中文图书语料MD5链接☆209Updated 7 months ago
- deep learning☆149Updated 2 months ago
- ☆290Updated last year
- 使用单个24G显卡,从0开始训练LLM☆47Updated 2 months ago
- Alpaca Chinese Dataset -- 中文指令微调数据集【人工+GPT4o持续更新】☆161Updated this week