pluveto / bpe_v3
基于 BPE 实现的中文分词。优化:预处理,并行计算,多字词,多词表
☆12Updated 2 years ago
Alternatives and similar repositories for bpe_v3:
Users that are interested in bpe_v3 are comparing it to the libraries listed below
- text embedding☆144Updated last year
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆153Updated 6 months ago
- 怎么训练一个LLM分词器☆144Updated last year
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆62Updated 2 months ago
- 中文自然语言推理与语义相似度数据集☆349Updated 3 years ago
- ☆67Updated last year
- An implementation of the BERT model and its related downstream tasks based on the PyTorch framework. @月来客栈☆593Updated last month
- Python ROUGE Score Implementation for Chinese Language Task (official rouge score)☆100Updated 10 months ago
- ChatGLM-6B添加了RLHF的实现,以及部分核心代码的逐行讲解 ,实例部分是做了个新闻短标题的生成,以及指定context推荐的RLHF的实现☆82Updated last year
- Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型☆410Updated last year
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆316Updated 9 months ago
- ☆70Updated 2 months ago
- PaddleNLP UIE模型的PyTorch版实现☆628Updated last year
- A curated list of research papers in Sentence Reprsentation Learning and a sts leaderboard of sentence embeddings.☆315Updated last year
- T2Ranking: A large-scale Chinese benchmark for passage ranking.☆157Updated last year
- 使用LoRA对ChatGLM进行微调。☆49Updated last year
- 更纯粹、更高压缩率的Tokenizer☆475Updated 5 months ago
- 中文 Instruction tuning datasets☆129Updated last year
- 此项目完成了关于 NLP-Beginner:自然语言处理入门练习 的所有任务(文本分类、信息抽取、知识图谱、机器翻译、问答系统、文本生成、Text-to-SQL、文本纠错、文本挖掘、知识蒸馏、模型加速、OCR、TTS、Prompt、embedding等),所有代码都经过测试…☆194Updated last year
- CIKM2023 Best Demo Paper Award. HugNLP is a unified and comprehensive NLP library based on HuggingFace Transformer. Please hugging for NL…☆389Updated last year
- ☆61Updated last year
- Implementation of Chinese ChatGPT☆287Updated last year
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆216Updated last year
- ☆108Updated 9 months ago
- ☆47Updated 8 months ago
- 全局指针统一处理嵌套与非嵌套NER的Pytorch实现☆392Updated 2 years ago
- 专注于中文领域大语言模型,落地到某个行业某个领域,成为一个行业大模型、公司级别或行业级别领域大模型。☆118Updated last month
- 3000000+语义理解与匹配数据集。可用于无监督对比学习、半监督学习等构建中文领域效果最好的预训练模型☆294Updated 2 years ago
- This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.☆424Updated 3 weeks ago
- pytorch分布式训练☆65Updated last year