sxontheway / Keep-Learning
The record of what I‘ve been through.
☆95Updated 3 weeks ago
Alternatives and similar repositories for Keep-Learning:
Users that are interested in Keep-Learning are comparing it to the libraries listed below
- ☆52Updated last year
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆149Updated last year
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆92Updated last year
- A MoE impl for PyTorch, [ATC'23] SmartMoE☆61Updated last year
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆296Updated 6 months ago
- adds Sequence Parallelism into LLaMA-Factory☆153Updated last week
- ☆76Updated last year
- A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or l…☆276Updated last year
- DeepSpeed Tutorial☆94Updated 6 months ago
- see readme☆92Updated 2 years ago
- Survey Paper List - Efficient LLM and Foundation Models☆238Updated 4 months ago
- Models and examples built with OneFlow☆96Updated 4 months ago
- An awesome gpu tasks scheduler. 轻量好用的GPU机群任务调度工具。觉得有用可以点个star☆170Updated 2 years ago
- ☆33Updated last year
- ☆84Updated last year
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆121Updated 9 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆421Updated this week
- 使用sentencepiece中BPE训练中文词表,并在transformers中进行使用。☆116Updated last year
- A brief of TorchScript by MNIST☆107Updated 2 years ago
- pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存的使用☆87Updated 11 months ago
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆107Updated last year
- pytorch distribute tutorials☆103Updated this week
- 中文 Instruction tuning datasets☆125Updated 10 months ago
- Inference code for LLaMA models☆113Updated last year
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆58Updated 7 months ago
- Must-read Papers of Parameter-Efficient Tuning (Delta Tuning) Methods on Pre-trained Models.☆281Updated last year
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆64Updated last year
- ☆175Updated 3 months ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆214Updated last year
- ☆152Updated this week