FunnySaltyFish / bilibili_comments_crawlLinks

基于 B 站评论区数据构建大语言模型训练用对话数据集

☆52

Alternatives and similar repositories for bilibili_comments_crawl

Users that are interested in bilibili_comments_crawl are comparing it to the libraries listed below

Sorting:

FunnySaltyFish / Better-Ruozhiba
【逐条处理完成】人为审核+修改每一条的弱智吧精选问题QA数据集
☆216Updated 4 months ago
KMnO4-zx / extract-dialogue
从小说中提取对话数据集
☆230Updated last year
KMnO4-zx / huanhuan-chat
Chat-甄嬛是利用《甄嬛传》剧本中所有关于甄嬛的台词和语句，基于ChatGLM2进行LoRA微调得到的模仿甄嬛语气的聊天语言模型。
☆711Updated 2 months ago
Zeyi-Lin / LLM-Finetune
大语言模型微调，Qwen2VL、Qwen2、GLM4指令微调
☆464Updated 2 months ago
zhaibowen / Retriever
Retriever-0.1B
☆93Updated last year
yichen-byte / medical-chatbot
基于ChatGLM3基座模型和LLAMA-Factory框架进行微调的一个中医问答机器人
☆95Updated last year
Tongjilibo / build_MiniLLM_from_scratch
从0到1构建一个MiniLLM (pretrain+sft+dpo实践中)
☆462Updated 4 months ago
puppyapple / Chinese_LLM_From_Scratch
☆29Updated 10 months ago
liuzard / transformers_zh_docs
Huggingface transformers的中文文档
☆267Updated last year
charent / Phi2-mini-Chinese
Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型，支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.
☆563Updated last year
AI-Study-Han / Zero-Chatgpt
从0开始，将chatgpt的技术路线跑一遍。
☆250Updated 11 months ago
ExpressGit / NLP_Study_Demo
NLP_Study_Demo
☆160Updated last year
peilongchencc / My-LLaMA-Factory
LLaMA-Factory使用经验记录
☆35Updated 11 months ago
Tongyi-EconML / FinQwen
FinQwen: 致力于构建一个开放、稳定、高质量的金融大模型项目，基于大模型搭建金融场景智能问答系统，利用开源开放来促进「AI+金融」。
☆402Updated last year
qiufengqijun / mini_qwen
这是一个从头训练大语言模型的项目，包括预训练、微调和直接偏好优化，模型拥有1B参数，支持中英文。
☆546Updated 5 months ago
owenliang / qwen-vllm
通义千问VLLM推理部署DEMO
☆595Updated last year
morettt / Chatbot-Trainer
专为新手设计！Chatbot Trainer 是一个基于开源语言模型（GLM4）的聊天机器人训练项目。你可以轻松训练出一个拥有你自己语气性格的聊天机器人，或训练任何你感兴趣的人物，包括名人、历史人物、动漫角色或电影小说中的虚拟人物。通过项目内置的数据集问答对制作指导，你…
☆42Updated 7 months ago
thu-coai / CharacterGLM-6B
[EMNLP'24] CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
☆469Updated 7 months ago
charent / ChatLM-mini-Chinese
中文对话0.2B小模型（ChatLM-Chinese-0.2B），开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调，给出三元组信息抽取微调示例。
☆1,573Updated last year
huang1332 / finetune_dataset_maker
为ChatGLM设计的微调数据集生成工具，速来制作自己的猫娘。
☆608Updated last year
jiahe7ay / MiniCharacterLLM
这是一个一键让小参数大模型进行角色扮演的项目，从数据构成和训练都包含在这项目中
☆24Updated last year
wangxb96 / RAG-QA-Generator
RAG-QA-Generator 是一个用于检索增强生成（RAG）系统的自动化知识库构建与管理工具。该工具通过读取文档数据，利用大规模语言模型生成高质量的问答对（QA对），并将这些数据插入数据库中，实现RAG系统知识库的自动化构建和管理。
☆219Updated 7 months ago
lansinuote / Simple_RLHF
☆93Updated last month
km1994 / AwesomeNLP
此项目完成了关于 NLP-Beginner：自然语言处理入门练习的所有任务（文本分类、信息抽取、知识图谱、机器翻译、问答系统、文本生成、Text-to-SQL、文本纠错、文本挖掘、知识蒸馏、模型加速、OCR、TTS、Prompt、embedding等），所有代码都经过测试…
☆208Updated last year
chg0901 / Honor_of_Kings_Multi-modal_Dataset
A Multi-modal RAG Project with Dataset from Honor of Kings, one of the most popular smart phone games in China
☆66Updated 11 months ago
HIT-SCIR / huozi
活字通用大模型
☆393Updated 10 months ago
jiahe7ay / MINI_LLM
This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.
☆460Updated 3 months ago
phbst / tinyRAG
RAG兴趣小组，全手写的一个RAG应用。Langchain的大部分库会很方便，但是你不一定理解其中原理，所以代码尽可能展现基本算法，主打理解RAG的原理
☆232Updated 10 months ago
sunyongdi / llm_classification
大模型文本分类
☆73Updated 11 months ago
open-chinese / alpaca-chinese-dataset
Alpaca Chinese Dataset -- 中文指令微调数据集
☆213Updated 10 months ago