CAIRI-China/AwesomeLLMsDatasets

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CAIRI-China/AwesomeLLMsDatasets)

CAIRI-China / AwesomeLLMsDatasets

深入探索大型语言模型（LLM）的世界，本项目汇集了跨越五个关键维度的代表性文本数据集——预训练语料库、微调指令数据集、偏好数据集、评估数据集、传统NLP数据集及多模态数据集。我们致力于为研究者和开发者提供最全面的资源，以推动人工智能技术的发展和应用。

☆21

Alternatives and similar repositories for AwesomeLLMsDatasets

Users that are interested in AwesomeLLMsDatasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

PRS-Organization / PRS-Trial-Version
View on GitHub
Trial version for prs platform (python project). Please note that the complete experience requires downloading the Unity resource.
☆10Jun 26, 2024Updated 2 years ago
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆11Feb 7, 2026Updated 5 months ago
albertwy / GPT-4V-Evaluation
View on GitHub
Data for evaluating GPT-4V
☆11Oct 26, 2023Updated 2 years ago
cyfml / OPSTL
View on GitHub
OPSTL: Self-supervised Skeleton-based Action Recognition in Occluded Environments
☆14Oct 25, 2023Updated 2 years ago
OnlyAR / RAL-Writer
View on GitHub
The code and data for the paper "Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation"
☆15Oct 8, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zhangxy-2019 / sgp-tod
View on GitHub
☆14Aug 21, 2025Updated 11 months ago
yanivle / fast_minbpe
View on GitHub
☆18Feb 6, 2025Updated last year
Megum1 / UNIT
View on GitHub
[ECCV'24] UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
☆10Dec 18, 2025Updated 7 months ago
VanillaCreamer / Awesome-Personalized-LLMs
View on GitHub
The latest progress of Personalized Large Language Models (LLMs).
☆52Updated this week
sxysxy / Taurix
View on GitHub
Taurix OS kernel. Taurix 系统内核，操作系统原理实(xjb)践(写)
☆12Dec 20, 2020Updated 5 years ago
AoiroRed / Library
View on GitHub
Library of some notes
☆10May 29, 2023Updated 3 years ago
Pchen0 / Web-Wechat-Bot
View on GitHub
这是一个可通过网页远程登录管理、可接入讯飞星火、ChatGPT等大语言模型的微信聊天机器人，使用微信网页版协议。
☆16Feb 20, 2024Updated 2 years ago
cszhangyi / NewsApp
View on GitHub
NewsApp包含客户端源码、服务端源码、数据库文件。基于Miscrosoft人工智能项目ProjectOxford中的Recognition Emotion做的，主要是基于用户的面部表情来推送不同类别的新闻。 Emotion API可以参考：https://www.p…
☆10Mar 2, 2016Updated 10 years ago
DezhiKong00 / Sentencepiece-chinese-bbpe
View on GitHub
使用Sentencepiece对中文语料进行分词
☆13Nov 30, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
sutdcv / Chaotic-World
View on GitHub
[ICCV2023] Chaotic World: A Large and Challenging Benchmark for Human Behavior Understanding in Chaotic Events
☆10Dec 7, 2024Updated last year
EliasCai / shanghai-citywalk
View on GitHub
「城语」APP基于A级景区、历史古迹、文物保护单位等基础数据，利用先进的大模型能力实现智能化的Citywalk 路线规划，包括设计一条路线、生成路线攻略、生成景点的推荐理由等三大核心功能；利用大模型减少了人工编辑和推荐的工作量，并可以根据游客的需求进行个性化定制，提升了游客…
☆19Feb 20, 2024Updated 2 years ago
weiliang822 / ML-BigHW
View on GitHub
同济大学计科机器学习大作业
☆10Mar 22, 2025Updated last year
pluveto / bpe_v3
View on GitHub
基于 BPE 实现的中文分词。优化：预处理，并行计算，多字词，多词表
☆14May 14, 2022Updated 4 years ago
Guan-JW / GMM-Isolated-Speech-Recognition
View on GitHub
基于MFCC特征构建单核GMM的0-9独立词语音识别，MFCC，GMM，sklearn，Isolated word recognition。
☆10Nov 18, 2020Updated 5 years ago
herbertskyper / yuketang_grab_answer
View on GitHub
爬取雨课堂答案
☆16Nov 21, 2024Updated last year
wertycn / hexo-deployer-qcloud-cos
View on GitHub
hexo腾讯云COS一键部署工具hexo-deployer-qcloud-cos使用说明
☆19Feb 27, 2022Updated 4 years ago
haidequanbu / ESC-Eval
View on GitHub
[EMNLP 2024] ”ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models“
☆27Jun 24, 2024Updated 2 years ago
sakurayun / bili-live-monitor
View on GitHub
监控哔哩哔哩直播间数据，实时保存至数据库，并在内置网页上查看精致的可视化统计图表。
☆12Jan 4, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Charmve / EmotionCube
View on GitHub
🐾 EmotionCube: An intelligent companion robot is designed based on expression recognition and intelligent speech.
☆19May 27, 2024Updated 2 years ago
6zHAOyi / BadVision
View on GitHub
This is an official code repository for CVPR 2025 paper BadVision.
☆15Nov 18, 2025Updated 8 months ago
qinghuannn / ChainHOI
View on GitHub
☆15May 1, 2025Updated last year
Nimolty / RoboKeyGen
View on GitHub
☆19Jul 7, 2024Updated 2 years ago
dutzxf1993 / stock-data-analysis
View on GitHub
python爬取股市数据，并对各个行业股票行情、财务数据进行重构分析
☆10Jul 26, 2020Updated 6 years ago
Philosober / AI-fundamentals-2025-Spring
View on GitHub
2024-2025下半学年人工智能导论（拔尖班）
☆17Jun 16, 2025Updated last year
SuchScar / FootPrint
View on GitHub
大学整理项目一：一个旅游踩点项目，踩点即对一个个事先有记录的有意思的旅行停驻点进行拜访游玩并留下你的足，这些停驻点我们称之为关注点。在该系统中还可以自己规划行程，事先计划好要前往的关注点，路线然后按照系统上的路线规划进行旅游，在旅游中可以写一些文字，发一些图片，整个行程完…
☆10Apr 27, 2018Updated 8 years ago
Raj-08 / Q-Flow
View on GitHub
Complete Reinforcement Learning Toolkit for Large Language Models!
☆21Aug 2, 2025Updated 11 months ago
helioforecast / auroramaps
View on GitHub
Aurora forecasts created from solar wind data (OVATION Prime 2010)
☆20Apr 11, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Bowenwu1 / JIAN-JIAN
View on GitHub
一款很棒的书摘软件微信小程序中山大学软件创新大赛十强参赛项目
☆16May 3, 2018Updated 8 years ago
gnp / minbpe-rs
View on GitHub
Port of Andrej Karpathy's minbpe to Rust
☆32May 6, 2024Updated 2 years ago
kyegomez / Aurora
View on GitHub
Implementation of the paper: "Aurora: A Foundation Model of the Atmosphere" in PyTorch
☆24Updated this week
1599570912 / IELTS-Speaking-AI
View on GitHub
🎤 AI驱动的雅思口语练习应用 - 实时语音识别、智能评分、历史记录管理
☆25Jun 20, 2025Updated last year
czhaneva / SkeleMixCLR
View on GitHub
This is the official implemntation for SkeleMixCLR
☆18Jul 8, 2022Updated 4 years ago
paolomandica / HYSP
View on GitHub
Official PyTorch implementation of the paper "Hyperbolic Self-paced Learning for Self-supervised Skeleton-based Action Representations" (…
☆22Jun 21, 2024Updated 2 years ago
Ultraopxt / Image-classification-and-prediction-of-skin-cancer
View on GitHub
皮肤癌图片分类和预测，kaggle数据集，基于pytorch,通过均衡采样，数据预处理，数据增强，densenet,adam优化器，交叉熵损失函数等技术。结果：precision0.9,recall0.85(仅训练4个周期）
☆22May 7, 2020Updated 6 years ago