taishan1994/sentencepiece_chinese_bpe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/taishan1994/sentencepiece_chinese_bpe)

taishan1994 / sentencepiece_chinese_bpe

使用sentencepiece中BPE训练中文词表，并在transformers中进行使用。

☆118

Alternatives and similar repositories for sentencepiece_chinese_bpe

Users that are interested in sentencepiece_chinese_bpe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yangjianxin1 / LLMPruner
View on GitHub
☆309Apr 6, 2023Updated 3 years ago
taishan1994 / PPO_Chinese_Generate
View on GitHub
☆11May 2, 2023Updated 3 years ago
yanqiangmiffy / how-to-train-tokenizer
View on GitHub
怎么训练一个LLM分词器
☆152Jul 13, 2023Updated 3 years ago
taishan1994 / chinese_llm_sft
View on GitHub
使用指令微调对大模型进行微调。
☆11Jun 28, 2023Updated 3 years ago
Alibaba-NLP / RankingGPT
View on GitHub
code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》
☆34Jan 9, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chenzen94 / debug-deepspeed-chat
View on GitHub
Debug DeepSpeed-Chat step by step in IDE (在IDE里一步一步调试DeepSpeed-Chat)
☆10Apr 17, 2023Updated 3 years ago
LouChao98 / nner_as_parsing
View on GitHub
☆16Mar 22, 2023Updated 3 years ago
Jarviswx / tonghuashun_text_matching
View on GitHub
同花顺算法挑战平台：【9-10双月赛】跨领域迁移的文本语义匹配
☆11Oct 28, 2021Updated 4 years ago
zejunwang1 / bloom_tuning
View on GitHub
BLOOM 模型的指令微调
☆24Jun 15, 2023Updated 3 years ago
hiyoung123 / Chinese-Text-Classification-Pytorch
View on GitHub
基于Pytorch实现的中文文本分类脚手架，以及常用模型对比。
☆18Apr 23, 2021Updated 5 years ago
sirimullalab / KinasepKipred
View on GitHub
Model to predict kinase-ligand pKi values.
☆12Jul 6, 2023Updated 3 years ago
CLUEbenchmark / SuperCLUE-Llama2-Chinese
View on GitHub
Llama2开源模型中文版-全方位测评，基于SuperCLUE的OPEN基准 | Llama2 Chinese evaluation with SuperCLUE
☆128Aug 2, 2023Updated 2 years ago
MorenoLaQuatra / vad
View on GitHub
Simple voice activity detection (VAD) algorithm in Python
☆15Aug 10, 2023Updated 2 years ago
will-wiki / softmasked-bert
View on GitHub
中文soft-masked bert文本纠错复现
☆21May 20, 2021Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
yangjianxin1 / Firefly-LLaMA2-Chinese
View on GitHub
Firefly中文LLaMA-2大模型，支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型
☆415Oct 21, 2023Updated 2 years ago
hlt-mt / Speech-MASSIVE
View on GitHub
Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…
☆25Oct 8, 2025Updated 9 months ago
jcottaar / seismic
View on GitHub
Jeroen Cottaar's work for the Kaggle Geophysical Waveform Inversion competition (2nd place)
☆13Aug 11, 2025Updated 11 months ago
qinxiaoyi / TimeVarying_ASV
View on GitHub
☆12Oct 17, 2024Updated last year
ymcui / Chinese-LLaMA-Alpaca
View on GitHub
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
☆18,945Apr 19, 2026Updated 3 months ago
5663015 / LLMs_train
View on GitHub
一套代码指令微调大模型
☆39Aug 1, 2023Updated 2 years ago
autumn9999 / MTC-with-Category-Shifts
View on GitHub
☆12Oct 5, 2022Updated 3 years ago
shawroad / CoSENT_Pytorch
View on GitHub
CoSENT、STS、SentenceBERT
☆170Feb 11, 2025Updated last year
DLLXW / baby-llama2-chinese
View on GitHub
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库；24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
☆2,923May 21, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
HarderThenHarder / transformers_tasks
View on GitHub
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SF…
☆2,421Sep 29, 2023Updated 2 years ago
avi33 / universalmelgan
View on GitHub
This is an unofficial implementation of universal melgan according to https://arxiv.org/abs/2011.09631
☆23Aug 15, 2022Updated 3 years ago
zyh16143998882 / LCM
View on GitHub
The code for the paper "LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling" (NeurIPS'24).
☆15Dec 25, 2024Updated last year
JianGuanTHU / CommonsenseStoryGen
View on GitHub
Implementation for paper "A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation"
☆24Mar 1, 2020Updated 6 years ago
usnistgov / F4DE
View on GitHub
Framework for Detection Evaluation (F4DE) : set of evaluation tools for detection evaluations and for specific NIST-coordinated evaluatio…
☆26Jul 6, 2017Updated 9 years ago
husisy / learning
View on GitHub
☆11Dec 2, 2025Updated 7 months ago
frankyoujian / Edge-Punct-Casing
View on GitHub
☆33Feb 4, 2025Updated last year
mberr / ea-sota-comparison
View on GitHub
Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)
☆16Apr 1, 2023Updated 3 years ago
27182812 / ChatGLM-LLaMA-chinese-insturct
View on GitHub
探索中文instruct数据在ChatGLM, LLaMA上的微调表现
☆387Apr 4, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
zeyuxie29 / SemanticVocoder
View on GitHub
☆28Apr 6, 2026Updated 3 months ago
yanqiangmiffy / InstructGLM
View on GitHub
ChatGLM-6B 指令学习|指令数据|Instruct
☆651Apr 10, 2023Updated 3 years ago
youtubevos / vis2vos
View on GitHub
Converting VIS json label to VOS format
☆12Feb 16, 2021Updated 5 years ago
huangruizhe / audio
View on GitHub
Data manipulation and transformation for audio signal processing, powered by PyTorch
☆10Sep 30, 2024Updated last year
datemoon / tf-code-acoustics
View on GitHub
it's a train acoustics model code lib
☆27May 20, 2020Updated 6 years ago
atultiwari / LLaVA-Med
View on GitHub
Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.
☆10Nov 29, 2023Updated 2 years ago
hkust-nlp / ceval
View on GitHub
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
☆1,862Jul 27, 2025Updated last year