OctopusMind / BBPE
BBPE 底层实现
☆18Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for BBPE
- 使用sentencepiece中BPE训练中文词表,并在transformers中进行使用。☆109Updated last year
- text embedding☆138Updated last year
- 更纯粹、更高压缩率的Tokenizer☆449Updated 6 months ago
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆266Updated 3 months ago
- baichuan LLM surpervised finetune by lora☆60Updated last year
- 中文 Instruction tuning datasets☆118Updated 7 months ago
- 阿里通义千问(Qwen-7B-Chat/Qwen-7B), 微调/LORA/推理☆68Updated 5 months ago
- Baichuan2代码的逐行解析版本,适 合小白☆209Updated last year
- This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.☆353Updated 6 months ago
- Hugging Face Transformers Course 笔记☆38Updated 2 years ago
- MiniRBT (中文小型预训练模型系列)☆251Updated last year
- BERT-based intent and slots detector for chatbots.☆135Updated 6 months ago
- ☆88Updated 4 months ago
- 基于pytorch的中文意图识别和槽位填充☆141Updated 4 months ago
- 简洁易用版TinyBert:基于Bert进行知识蒸馏的预训练语言模型☆252Updated 4 years ago
- A framework for cleaning Chinese dialog data☆260Updated 3 years ago
- Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型☆396Updated last year
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆202Updated 11 months ago
- basic framework for rag(retrieval augment generation)☆75Updated 10 months ago
- 大语言模型微调,Qwen2、GLM4指令微调☆207Updated 3 months ago
- 中文自然语言推理与语义相似度数据集☆343Updated 2 years ago
- SimBERT升级版(SimBERTv2)!☆438Updated 2 years ago
- 用于汇总目前的开源中文对话数据集☆113Updated last year
- Alpaca Chinese Dataset -- 中文指令微调数据集【人工+GPT4o持续更新】☆184Updated last month
- 开源SFT数据集整理,随时补充☆440Updated last year
- ☆297Updated last year
- 中文聊天小模型,用t5 base在大量数据上有监督。☆96Updated last year
- chatglm多gpu用deepspeed和☆402Updated 4 months ago
- 从零实现一个小参数量中文大语言模型。☆257Updated 2 months ago
- SimCSE在中文上的复现,有监督+无监督☆265Updated 2 years ago