深入探索大型语言模型(LLM)的世界,本项目汇集了跨越五个关键维度的代表性文本数据集——预训练语料库、微调指令数据集、偏好数据集、评估数据集、传统NLP数据集及多模态数据集。我们致力于为研究者和开发者提供最全面的资源,以推动人工智能技术的发展和应用。
☆19Apr 26, 2024Updated last year
Alternatives and similar repositories for AwesomeLLMsDatasets
Users that are interested in AwesomeLLMsDatasets are comparing it to the libraries listed below
Sorting:
- The latest progress of Personalized Large Language Models (LLMs).☆33Jan 7, 2026Updated last month
- OPSTL: Self-supervised Skeleton-based Action Recognition in Occluded Environments☆14Oct 25, 2023Updated 2 years ago
- ☆17Feb 6, 2025Updated last year
- An Efficent BPE Algorithm Faster then Hugging Face Tokenizer's Implementation☆13Sep 9, 2024Updated last year
- ☆16Apr 19, 2024Updated last year
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- ☆10Nov 17, 2022Updated 3 years ago
- Trial version for prs platform (python project). Please note that the complete experience requires downloading the Unity resource.☆10Jun 26, 2024Updated last year
- [ICCV2023] Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events☆10Dec 7, 2024Updated last year
- Official implementation of paper "Masked Distillation with Receptive Tokens", ICLR 2023.☆10Mar 13, 2023Updated 2 years ago
- Codebase for the paper "Schema-guided User Satisfaction Modeling for Task-oriented Dialogues"☆11Aug 6, 2025Updated 7 months ago
- 同济大学计科机器学习大作业☆10Mar 22, 2025Updated 11 months ago
- ☆14Apr 1, 2023Updated 2 years ago
- ☆17Feb 2, 2024Updated 2 years ago
- 2024-2025下半学年人工智能导论(拔尖班)☆17Jun 16, 2025Updated 8 months ago
- Official code of the MSF model for GZSSAR (ICIG 2023)☆14Jan 3, 2026Updated 2 months ago
- Code for "Evaluating Spatial Understanding of Large Language Models" TMLR 2024.☆16Feb 22, 2024Updated 2 years ago
- Official implementation of the paper "STARS: Self-supervised 3D Action Recognition with Contrastive Tuning".☆13Jan 6, 2025Updated last year
- Official PyTorch Implementation of Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos☆11Feb 10, 2026Updated 3 weeks ago
- 基于 BPE 实现的中文分词。优化:预处理,并行计算,多字词,多词表☆14May 14, 2022Updated 3 years ago
- 这是一个可通过网页远程登录管理、可接入讯飞星火、ChatGPT等大语言模型的微信聊天机器人,使用微信网页版协议。☆16Feb 20, 2024Updated 2 years ago
- 使用Sentencepiece对中文语料进行分词☆13Nov 30, 2023Updated 2 years ago
- Open-source code for GEAR☆13Dec 3, 2025Updated 3 months ago
- 基于MFCC特征构建单核GMM的0-9独立词语音识别,MFCC,GMM,sklearn,Isolated word recognition。☆10Nov 18, 2020Updated 5 years ago
- 监控哔哩哔哩直播间数据,实时保存至数据库,并在内置网页上查看精致的可视化统计图表。☆13Jan 4, 2022Updated 4 years ago
- Build LLM Application with Local Documents☆19Jun 13, 2025Updated 8 months ago
- python爬取股市数据,并对各个行业股票行情、财务数据进行重构分析☆11Jul 26, 2020Updated 5 years ago
- AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition (ICRA 2021)☆11Dec 29, 2023Updated 2 years ago
- 这是一个大学生互联网+的大创项目:“一点到家”——云滇家政平台助力乡村振兴,系统前台:微信小程序,后端springboot,数据库mysql。属于一个非常值得推荐的项目,系统源码简单宜读,干净简洁、注释详细,可二次开发。创意满满,贴近生活,缓解就业压力,为农民增收致富,促进…☆14Jun 17, 2023Updated 2 years ago
- ☆14Aug 21, 2025Updated 6 months ago
- 大学整理项目一:一个旅游踩点项目,踩点即对一个个事先有记录的有意思的旅行停驻点进行拜访游玩并留下你的足,这些停驻点我们称之为关注点。在该系统中还可以自己规划行程,事先计划好要前往的关注点 ,路线然后按照系统上的路线规划进行旅游,在旅游中可以写一些文字,发一些图片,整个行程完…☆10Apr 27, 2018Updated 7 years ago
- 基于youtube、bilibili等视频平台、webpage网页等,利用零一万物大模型或ollama本地小模型构建大语言模型高质量训练数据集(计划支持可自定义输出的训练数据格式)☆19May 2, 2024Updated last year
- ☆19Jul 7, 2024Updated last year
- Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning☆14Jun 1, 2025Updated 9 months ago
- ☆18Oct 19, 2024Updated last year
- 基于Tensorflow的文本内容安全审核☆20Aug 15, 2024Updated last year
- Complete Reinforcement Learning Toolkit for Large Language Models!☆21Aug 2, 2025Updated 7 months ago
- Aurora forecasts created from solar wind data (OVATION Prime 2010)☆20Apr 11, 2025Updated 10 months ago
- ☆16May 14, 2024Updated last year