lvyufeng / Cybertron
mindspore implementation of transformers
☆66Updated 2 years ago
Alternatives and similar repositories for Cybertron:
Users that are interested in Cybertron are comparing it to the libraries listed below
- Natural Language Processing Tutorial for MindSpore Users☆142Updated 11 months ago
- ☆18Updated 2 years ago
- MindSpore implementations of Generative Adversarial Networks.☆22Updated 2 years ago
- 《动手学深度学习》的MindSpore实现。供MindSpore学习者配合李沐老师课程使用。☆115Updated last year
- pytorch分布式训练☆64Updated last year
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated 2 years ago
- An introduction to basic concepts of Transformers and key techniques of their recent advances.☆49Updated last year
- ☆158Updated this week
- ☆108Updated this week
- A Tight-fisted Optimizer☆47Updated 2 years ago
- ☆52Updated last year
- Model Compression for Big Models☆158Updated last year
- Must-read papers on improving efficiency for pre-trained language models.☆103Updated 2 years ago
- A paper list about diffusion models for natural language processing.☆182Updated last year
- Pretrain CPM-1☆51Updated 3 years ago
- Lion and Adam optimization comparison☆60Updated 2 years ago
- ☆84Updated last year
- an implementation of transformer, bert, gpt, and diffusion models for learning purposes☆152Updated 5 months ago
- ☆185Updated 5 months ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆109Updated 10 months ago
- RoFormer V1 & V2 pytorch☆491Updated 2 years ago
- A framework for training, evaluating and testing models in pytorch.☆82Updated 2 years ago
- seq_2_seq text generation based on transformers☆24Updated 4 years ago
- Ladder Side-Tuning在CLUE上的简单尝试☆19Updated 2 years ago
- 一个用于学习的仿Pytorch纯Python实现的自动求导工具。☆51Updated 11 months ago
- 📊 A simple command-line utility for querying and monitoring GPU status☆90Updated 2 years ago
- [KDD'22] Learned Token Pruning for Transformers☆96Updated 2 years ago
- ATC23 AE☆45Updated last year
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆95Updated last year
- Inference code for LLaMA models☆118Updated last year