lvyufeng / cybertron-aiLinks
mindspore implementation of transformers
☆68Updated 2 years ago
Alternatives and similar repositories for cybertron-ai
Users that are interested in cybertron-ai are comparing it to the libraries listed below
Sorting:
- Natural Language Processing Tutorial for MindSpore Users☆140Updated last year
- MindSpore implementations of Generative Adversarial Networks.☆23Updated 3 years ago
- Model Compression for Big Models☆165Updated 2 years ago
- ATC23 AE☆47Updated 2 years ago
- An awesome gpu tasks scheduler. 轻量好用的GPU机群任务调度工具。觉得有用可以点个star☆193Updated 3 years ago
- ☆79Updated last year
- 《动手学深度学习》的MindSpore实现。供MindSpore学习者配合李沐老师课程使用。☆123Updated 2 years ago
- ☆150Updated 5 months ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆95Updated 3 weeks ago
- LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training☆406Updated 4 months ago
- ☆18Updated 3 years ago
- Inference code for LLaMA models☆128Updated 2 years ago
- Efficient, Low-Resource, Distributed transformer implementation based on BMTrain☆263Updated 2 years ago
- Models and examples built with OneFlow☆100Updated last year
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆70Updated 2 years ago
- The record of what I‘ve been through. Now moved to Notion. See link below☆101Updated 10 months ago
- ☆63Updated last year
- ☆51Updated 2 years ago
- an implementation of parallel skills like amp, ddp, pp, tp for learning purposes☆14Updated 2 years ago
- 一个用于学习的仿Pytorch纯Python实现的自动求导工具。☆51Updated last year
- ☆36Updated 11 months ago
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆159Updated last year
- Collaborative Training of Large Language Models in an Efficient Way☆417Updated last year
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆98Updated 2 years ago
- ☆84Updated 2 years ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆183Updated 2 years ago
- qwen-nsa☆84Updated last month
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆213Updated 10 months ago
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆185Updated last year
- Implementation of FlashAttention in PyTorch☆176Updated 11 months ago