CAIRI-China / AwesomeLLMsDatasetsLinks
深入探索大型语言模型(LLM)的世界,本项目汇集了跨越五个关键维度的代表性文本数据集——预训练语料库、微调指令数据集、偏好数据集、评估数据集、传统NLP数据集及多模态数据集。我们致力于为研究者和开发者提供最全面的资源,以推动人工智能技术的发展和应用。
☆18Updated last year
Alternatives and similar repositories for AwesomeLLMsDatasets
Users that are interested in AwesomeLLMsDatasets are comparing it to the libraries listed below
Sorting:
- RAG 论文学习☆167Updated 5 months ago
- TinyRAG☆333Updated 2 months ago
- RAG兴趣小组,全手写的一个RAG应用。Langchain的大部分库会很方便,但是你不一定理解其中原理,所以代码尽可能展现基本算法,主打理解RAG的原理☆234Updated 11 months ago
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆775Updated last month
- minimal-cost for training 0.5B R1-Zero☆767Updated 3 months ago
- Agentic RAG R1 Framework via Reinforcement Learning☆291Updated 3 months ago
- 从零搭建Agent框架(Build LLM ReAct Agent from scratch)☆91Updated 10 months ago
- 从0到1构建一个MiniLLM (pretrain+sft+dpo实践中)☆468Updated 5 months ago
- 欢迎来到 LLM-Dojo,这里是一个开源大模型学习场所,使用简洁且易阅读的代码构建模型训练框架(支持各种主流模型如Qwen、Llama、GLM等等)、RLHF框架(DPO/CPO/KTO/PPO)等各种功能。👩🎓👨🎓☆858Updated 2 weeks ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆370Updated 4 months ago
- This is the repository for the Tool Learning survey.☆431Updated last month
- This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.☆470Updated 4 months ago
- 对llama3进行全参微调、lora微调以及qlora微调。☆209Updated 11 months ago
- ☆260Updated 9 months ago
- 这是一个open-r1的复现项目,对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练,观察到一些有趣的现象。☆44Updated 4 months ago
- ☆96Updated 2 months ago
- ☆99Updated 6 months ago
- 本项目旨在收集开源的表格智能任务数据集(比如表格问答、表格-文本生成等),将原始数据整理为指令微调格式的数据并微调LLM,进而增强LLM对于表格数据的理解,最终构建出专门面向表格智能任务的大型语言模型。☆610Updated last year
- ☆23Updated 4 months ago
- 快速入门RAG与私有化部署☆204Updated last year
- ☆325Updated 2 months ago
- 该仓库主要记录 LLMs 算法工程师相关的顶会论文研读笔记(多模态、PEFT、小样本QA问答、RAG、LMMs可解释性、Agents、CoT)☆352Updated last year
- 一个很小很小的RAG系统☆283Updated 4 months ago
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆332Updated last week
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆576Updated 4 months ago
- CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models☆328Updated 3 months ago
- ☆546Updated 8 months ago
- A small open source 3D agent simulator based on LLM.☆69Updated 9 months ago
- ☆419Updated 7 months ago
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆626Updated last month