lvyufeng / Cybertron
mindspore implementation of transformers
☆65Updated last year
Related projects: ⓘ
- Natural Language Processing Tutorial for MindSpore Users☆139Updated 5 months ago
- ☆18Updated last year
- 《动手学深度学习》的MindSpore实现。供MindSpore学习者配合李沐老师课程使用。☆102Updated last year
- pytorch distribute tutorials☆72Updated 3 weeks ago
- MindSpore implementations of Generative Adversarial Networks.☆21Updated 2 years ago
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆139Updated 10 months ago
- ☆49Updated last year
- Implementation of FlashAttention in PyTorch☆95Updated last year
- ☆125Updated last week
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆95Updated last year
- Must-read papers on improving efficiency for pre-trained language models.☆100Updated last year
- ☆170Updated 5 months ago
- 一个用于学习的仿Pytorch纯Python实现的自动求导工具。☆45Updated 4 months ago
- ☆48Updated 3 weeks ago
- pytorch分布式训练☆57Updated last year
- A paper list about diffusion models for natural language processing.☆170Updated last year
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆182Updated 4 months ago
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆51Updated 3 months ago
- Efficient, Low-Resource, Distributed transformer implementation based on BMTrain☆233Updated 9 months ago
- A minimalist and extensible PyTorch extension for implementing custom backend operators in PyTorch.☆25Updated 5 months ago
- The blog, read report and code example for AGI/LLM related knowledge.☆11Updated last month
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆56Updated last year
- ☆82Updated last year
- Grab GPU whenever available☆272Updated 2 years ago
- ☆131Updated last week
- Inference code for LLaMA models☆101Updated last year
- A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or l…☆274Updated last year
- 擂台赛3-大规模预训练调优比赛的示例代码与baseline实现☆38Updated last year
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆116Updated 4 months ago
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆191Updated 4 months ago