CAIRI-China / AwesomeLLMsDatasetsLinks
深入探索大型语言模型(LLM)的世界,本项目汇集了跨越五个关键维度的代表性文本数据集——预训练语料库、微调指令数据集、偏好数据集、评估数据集、传统NLP数据集及多模态数据集。我们致力于为研究者和开发者提供最全面的资源,以推动人工智能技术的发展和应用。
☆17Updated last year
Alternatives and similar repositories for AwesomeLLMsDatasets
Users that are interested in AwesomeLLMsDatasets are comparing it to the libraries listed below
Sorting:
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆206Updated last year
- ☆22Updated 2 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆216Updated 4 months ago
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆262Updated 5 months ago
- ☆223Updated last year
- This is the repository for the Tool Learning survey.☆395Updated last month
- Generate dialog data from documents using LLM like ChatGLM2 or ChatGPT;利用ChatGLM2,ChatGPT等大模型根据文档生成对话数据集☆158Updated last year
- ☆254Updated 6 months ago
- ☆541Updated 5 months ago
- Agentic RAG R1 Framework via Reinforcement Learning☆235Updated last month
- 大模型多维度中文对齐评测基准 (ACL 2024)☆395Updated 10 months ago
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆217Updated this week
- 从零搭建Agent框架(Build LLM ReAct Agent from scratch)☆78Updated 8 months ago
- ☆246Updated 2 weeks ago
- ☆30Updated 10 months ago
- This is the reading list for the survey "A Survey on the Optimization of LLM-based Agents ". We will keep adding papers and improving the…☆115Updated last month
- 使用单个24G显卡,从0开始训练LLM☆56Updated last month
- Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...☆73Updated 2 months ago
- personal chatgpt☆373Updated 6 months ago
- 开源SFT数据集整理,随时补充☆522Updated 2 years ago
- YuLan: An Open-Source Large Language Model☆627Updated 5 months ago
- 本项目旨在收集开源的表格智能任务数据集(比如表格问答、表格-文本生成等),将原始数据整理为指令微调格式的数据并微调LLM,进而增强LLM对于表格数据的理解,最终构建出专门面向表格智能任务的大型语言模型。☆596Updated last year
- 对llama3进行全参微调、lora微调以及qlora微调。☆200Updated 8 months ago
- LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案☆71Updated last year
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆340Updated 2 months ago
- llm & rl☆155Updated this week
- This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.☆444Updated last month
- FlagEval is an evaluation toolkit for AI large foundation models.☆337Updated 2 months ago
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆327Updated 11 months ago
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆253Updated last year