lvyufeng / Cybertron
mindspore implementation of transformers
☆66Updated 2 years ago
Alternatives and similar repositories for Cybertron
Users that are interested in Cybertron are comparing it to the libraries listed below
Sorting:
- Natural Language Processing Tutorial for MindSpore Users☆142Updated last year
- ☆18Updated 2 years ago
- MindSpore implementations of Generative Adversarial Networks.☆22Updated 2 years ago
- 《动手学深度学习》的MindSpore实现。供MindSpore学习者配合李沐老师课程使用。☆117Updated last year
- ☆33Updated 4 months ago
- an implementation of parallel skills like amp, ddp, pp, tp for learning purposes☆13Updated last year
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆154Updated 7 months ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated 2 years ago
- ☆125Updated 2 weeks ago
- Must-read papers on improving efficiency for pre-trained language models.☆103Updated 2 years ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆81Updated last month
- The record of what I‘ve been through.☆98Updated 3 months ago
- 一款便捷的抢占显卡脚本☆330Updated 3 months ago
- A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or l…☆280Updated last year
- Inference code for LLaMA models☆120Updated last year
- [KDD'22] Learned Token Pruning for Transformers☆97Updated 2 years ago
- ☆70Updated 2 months ago
- ☆79Updated last year
- ATC23 AE☆45Updated 2 years ago
- An introduction to basic concepts of Transformers and key techniques of their recent advances.☆49Updated last year
- 简洁易用版TinyBert:基于Bert进行知识蒸馏的预训练语言模型☆264Updated 4 years ago
- ☆35Updated last year
- ☆52Updated last year
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆78Updated this week
- some news or blogs for new computer architecture like risc-v, xPU, ASIC, etc....☆14Updated 5 years ago
- Implementation of FlashAttention in PyTorch☆146Updated 4 months ago
- [ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408☆195Updated 2 years ago
- qwen-nsa☆61Updated last month
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆111Updated last year
- Multi-Candidate Speculative Decoding☆35Updated last year