lvyufeng / Cybertron
mindspore implementation of transformers
☆66Updated 2 years ago
Alternatives and similar repositories for Cybertron:
Users that are interested in Cybertron are comparing it to the libraries listed below
- Natural Language Processing Tutorial for MindSpore Users☆142Updated 10 months ago
- 《动手学深度学习》的MindSpore实现。供MindSpore学习者配合李沐老师课程使用。☆113Updated last year
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆151Updated 4 months ago
- MindSpore implementations of Generative Adversarial Networks.☆22Updated 2 years ago
- ☆62Updated last week
- ATC23 AE☆45Updated last year
- ☆62Updated last month
- ☆18Updated 2 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated last year
- pytorch分布式训练☆63Updated last year
- ☆76Updated last year
- LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training☆398Updated last month
- Inference code for LLaMA models☆113Updated last year
- adds Sequence Parallelism into LLaMA-Factory☆154Updated this week
- ☆57Updated 2 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆49Updated last month
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆35Updated 6 months ago
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆61Updated last year
- [ICLR 2025] PEARL: parallel speculative decoding with adaptive draft length☆39Updated this week
- ☆84Updated last year
- Pretrain CPM-1☆51Updated 3 years ago
- ☆52Updated last year
- Must-read Papers of Parameter-Efficient Tuning (Delta Tuning) Methods on Pre-trained Models.☆281Updated last year
- Models and examples built with OneFlow☆96Updated 4 months ago
- 大模型进阶面经☆26Updated last week
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆107Updated last year
- ☆177Updated 4 months ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆92Updated last year
- A paper list about diffusion models for natural language processing.☆181Updated last year