jdh-algo / JoyDataForgeLinks
数据合成工具,简单高效的合成不同业务场景的大模型训练数据
☆24Updated 5 months ago
Alternatives and similar repositories for JoyDataForge
Users that are interested in JoyDataForge are comparing it to the libraries listed below
Sorting:
- Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support☆133Updated 3 months ago
- Local DeepSearch (Advantage: Low Threshold): an implementation of Agentic RAG based on DeepSeek-R1 API and Tavily API☆11Updated this week
- CMB, A Comprehensive Medical Benchmark in Chinese☆200Updated 2 months ago
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆262Updated 5 months ago
- ☆241Updated 2 weeks ago
- ☆53Updated 9 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆373Updated 9 months ago
- ☆111Updated 11 months ago
- Deepspeed、LLM、Medical_Dialogue、医疗大模型、预训练、微调☆273Updated last year
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆252Updated last year
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆88Updated 9 months ago
- 中文大模型微调(LLM-SFT), 数学指令数据集MWP-Instruct, 支持模型(ChatGLM-6B, LLaMA, Bloom-7B, baichuan-7B), 支持(LoRA, QLoRA, DeepSpeed, UI, TensorboardX), 支持(微…☆203Updated last year
- ☆222Updated last year
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆144Updated 6 months ago
- A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.☆364Updated last year
- ☆63Updated last month
- 怎么训练一个LLM分词器☆150Updated last year
- ☆141Updated last year
- PromptCBLUE: a large-scale instruction-tuning dataset for multi-task and few-shot learning in the medical domain in Chinese☆370Updated last year
- This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.☆441Updated last month
- 中文大语言模型评测第二期☆70Updated last year
- ☆109Updated 7 months ago
- 使用单个24G显卡,从0开始训练LLM☆55Updated last month
- A Chinese National Medical Licensing Examination dataset and large languge model benchmarks☆66Updated last year
- 探索 LLM 在法律行业的应用潜力☆90Updated 6 months ago
- ☆142Updated 11 months ago
- Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud☆113Updated last month
- The code repository of paper "TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities"☆20Updated 6 months ago
- A Toolkit for Table-based Question Answering☆112Updated last year
- Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"☆81Updated 9 months ago