jdh-algo / JoyDataForge
数据合成工具,简单高效的合成不同业务场景的大模型训练数据
☆24Updated 3 months ago
Alternatives and similar repositories for JoyDataForge:
Users that are interested in JoyDataForge are comparing it to the libraries listed below
- Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support☆107Updated last month
- ☆51Updated 7 months ago
- 🌐 WebThinker: Empowering Large Reasoning Models with Deep Research Capability☆158Updated this week
- task-oriented dialogue system, especially for LLM, contain subtask: (1) intent-detection (2) slot filling (3) dialogue state tracking☆97Updated this week
- CMB, A Comprehensive Medical Benchmark in Chinese☆187Updated last month
- 本项目用于大模型数学解题能力方面的数据集合成,模型训练及评测,相关文章记录。☆83Updated 7 months ago
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆255Updated 3 months ago
- ☆405Updated this week
- A Toolkit for Table-based Question Answering☆112Updated last year
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆282Updated 2 weeks ago
- A curated list of awesome works in Routing LLMs paradigm (👉 Welcome to submit your contributions to this code repository)☆30Updated last month
- RAG 论文学习☆117Updated last month
- ☆153Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆133Updated 4 months ago
- ☆130Updated 3 months ago
- ☆97Updated last year
- ☆140Updated last year
- ☆55Updated 6 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆361Updated 7 months ago
- CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models☆301Updated 5 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆305Updated last week
- ☆218Updated last year
- The demo, code and data of FollowRAG☆72Updated this week
- Official Repository for SIGIR2024 Demo Paper "An Integrated Data Processing Framework for Pretraining Foundation Models"☆77Updated 8 months ago
- ☆159Updated 3 weeks ago
- A Chinese National Medical Licensing Examination dataset and large languge model benchmarks☆60Updated last year
- LAiW: A Chinese Legal Large Language Models Benchmark☆79Updated 9 months ago
- Deepspeed、LLM、Medical_Dialogue、医疗大模型、预训练、微调☆261Updated 10 months ago
- Awesome Agent Training☆72Updated this week
- 大模型检索增强生成技术最佳实践。☆74Updated 7 months ago