lvyufeng / Cybertron

mindspore implementation of transformers

☆66

Alternatives and similar repositories for Cybertron

Users that are interested in Cybertron are comparing it to the libraries listed below

Sorting:

lvyufeng / mindspore-nlp-tutorial
Natural Language Processing Tutorial for MindSpore Users
☆142Updated last year
lvyufeng / easy_mindspore_bk
☆18Updated 2 years ago
lvyufeng / MindSpore-GAN
MindSpore implementations of Generative Adversarial Networks.
☆22Updated 2 years ago
mindspore-courses / d2l-mindspore
《动手学深度学习》的MindSpore实现。供MindSpore学习者配合李沐老师课程使用。
☆117Updated last year
mindspore-lab / mindrlhf
☆33Updated 4 months ago
firechecking / CleanParallel
an implementation of parallel skills like amp, ddp, pp, tp for learning purposes
☆13Updated last year
firechecking / CleanTransformer
an implementation of transformer, bert, gpt, and diffusion models for learning purposes
☆154Updated 7 months ago
ZhuiyiTechnology / GAU-alpha
基于Gated Attention Unit的Transformer模型（尝鲜版）
☆97Updated 2 years ago
mdy666 / mdy_triton
☆125Updated 2 weeks ago
TobiasLee / Awesome-Efficient-PLM
Must-read papers on improving efficiency for pre-trained language models.
☆103Updated 2 years ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆81Updated last month
sxontheway / Keep-Learning
The record of what I‘ve been through.
☆98Updated 3 months ago
godweiyang / GrabGPU
一款便捷的抢占显卡脚本
☆330Updated 3 months ago
OpenBMB / BMPrinciples
A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or l…
☆280Updated last year
sunkx109 / llama
Inference code for LLaMA models
☆120Updated last year
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆97Updated 2 years ago
chunhuizhang / bert_t5_gpt
☆70Updated 2 months ago
Ascend / AscendSpeed
☆79Updated last year
thu-pacman / SmartMoE-AE
ATC23 AE
☆45Updated 2 years ago
NiuTrans / Introduction-to-Transformers
An introduction to basic concepts of Transformers and key techniques of their recent advances.
☆49Updated last year
Lisennlp / TinyBert
简洁易用版TinyBert：基于Bert进行知识蒸馏的预训练语言模型
☆264Updated 4 years ago
tingshua-yts / BetterDL
☆35Updated last year
cauyxy / bilivideos
☆52Updated last year
modelscope / Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…
☆78Updated this week
cquca / ca_new_trends
some news or blogs for new computer architecture like risc-v, xPU, ASIC, etc....
☆14Updated 5 years ago
shreyansh26 / FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
☆146Updated 4 months ago
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆195Updated 2 years ago
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆61Updated last month
DD-DuDa / BitDistiller
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆111Updated last year
NJUNLP / MCSD
Multi-Candidate Speculative Decoding
☆35Updated last year