CAIRI-China / AwesomeLLMsDatasetsView external linksLinks
深入探索大型语言模型(LLM)的世界,本项目汇集了跨越五个关键维度的代表性文本数据集——预训练语料库、微调指令数据集、偏好数据集、评估数据集、传统NLP数据集及多模态数据集。我们致力于为研究者和开发者提供最全面的资源,以推动人工智能技术的发展和应用。
☆19Apr 26, 2024Updated last year
Alternatives and similar repositories for AwesomeLLMsDatasets
Users that are interested in AwesomeLLMsDatasets are comparing it to the libraries listed below
Sorting:
- TOD-Flow: Modeling the Structure of Task-Oriented Dialogues☆13Feb 7, 2024Updated 2 years ago
- 同济大学计科机器学习大作业☆10Mar 22, 2025Updated 10 months ago
- Taurix OS kernel. Taurix 系统内核,操作系统原理实(xjb)践(写)☆12Dec 20, 2020Updated 5 years ago
- 2024-2025下半学年人工智能导论(拔尖班)☆17Jun 16, 2025Updated 7 months ago
- ☆14Apr 1, 2023Updated 2 years ago
- Code for "Evaluating Spatial Understanding of Large Language Models" TMLR 2024.☆16Feb 22, 2024Updated last year
- 基于 BPE 实现的中文分词。优化:预处理,并行计算,多字词,多词表☆14May 14, 2022Updated 3 years ago
- 监控哔哩哔哩直播间数据,实时保存至数据库,并在内置网页上查看精致的可视化统计图表。☆12Jan 4, 2022Updated 4 years ago
- The code and data for the paper "Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation"☆13Oct 8, 2025Updated 4 months ago
- Open-source code for GEAR☆13Dec 3, 2025Updated 2 months ago
- Library of some notes☆10May 29, 2023Updated 2 years ago
- python爬取股市数据,并对各个行业股票行情、财务数据进行重构分析☆11Jul 26, 2020Updated 5 years ago
- This is unsupervised domaina adaptation object detection based on adversarial learning Implementation via mmdetection framework☆15Jan 13, 2024Updated 2 years ago
- 一款很棒的书摘软件 微信小程序 中山大学软件创新大赛十强参赛项目☆16May 3, 2018Updated 7 years ago
- replace the current round robin scheduler in xv6 with a lottery scheduler☆13Oct 19, 2019Updated 6 years ago
- ☆29Dec 6, 2025Updated 2 months ago
- Integrating Large Weather Models with Data Assimilation☆22Jun 2, 2024Updated last year
- Introduction to Data Science and Engineering - 2023 Autumn☆26Jan 2, 2024Updated 2 years ago
- 2025华为软件精英挑战赛 总决赛最佳大模型应用奖☆38Apr 22, 2025Updated 9 months ago
- hexo腾讯云COS一键部署工具hexo-deployer-qcloud-cos使用说明☆19Feb 27, 2022Updated 3 years ago
- Pytorch's version implementation of SSRNet for age and gender Estimation☆16Sep 2, 2020Updated 5 years ago
- [EMNLP 2024] ”ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models“☆26Jun 24, 2024Updated last year
- Task-Optimized Adapters for an End-to-End Dialogue System Paper Code☆22Jul 31, 2023Updated 2 years ago
- 🔧 AutoProtégé是一个 面向斯坦福大学开源本体构建工具protege而开发的python操作库,支持海量知识自动化映射和管理的自定义解决方案。☆28Feb 18, 2023Updated 2 years ago
- Test of Neural Compute Stick on YOLO and SSD face detection models (Desktop or Raspberry Pi, NCSDK2 or OpenVINO)☆32Jan 8, 2019Updated 7 years ago
- ☆28May 22, 2023Updated 2 years ago
- 计算机毕设之基于SSM的旅游网站系统(源代码+数据库+配套论文+ppt) java毕业设计,基于微信小程序,基于安卓毕业设计,机器学习,大数据毕业设计,Python+Django+Vue ,php ,Flask,node.js ,SpringBoot Vue,SSM,JSP…☆34Dec 15, 2025Updated last month
- ☆28Jan 13, 2024Updated 2 years ago
- [ACMMM 23] Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization☆29Nov 4, 2023Updated 2 years ago
- 本次数据分析基于阿里云天池数据集(用户行为数据集),使用转化漏斗,AARRR模型,对常见电商分析指标,包括转化率,PV,UV,留存率,复购率等进行分析,分析过程中使用python进行数据清洗及可视化。☆32May 14, 2020Updated 5 years ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization, with PyTorch/CUDA☆42Feb 27, 2024Updated last year
- 数据库原理大作业:机票预定信息系统☆33Aug 20, 2019Updated 6 years ago
- 雅思学习资料,包括词汇真经疑难词汇(分话题整理),作文句式短语积累,口语表达积累等☆42Mar 23, 2025Updated 10 months ago
- ☆35Mar 14, 2023Updated 2 years ago
- EMNLP 2023☆42Mar 13, 2024Updated last year
- A project can perform face-detection in image/webcam, based on opencv and deep-learning☆38Nov 3, 2018Updated 7 years ago
- 程序设计实践 (2021夏季学期)☆32Aug 22, 2021Updated 4 years ago
- LLM Tokenizer with BPE algorithm☆47May 7, 2024Updated last year
- ☆37Mar 28, 2020Updated 5 years ago