CoinCheung/gdGPT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CoinCheung/gdGPT)

CoinCheung / gdGPT

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

☆97

Alternatives and similar repositories for gdGPT

Users that are interested in gdGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HuangLK / transpeeder
View on GitHub
train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
☆224Nov 21, 2023Updated 2 years ago
SparkJiao / llama-pipeline-parallel
View on GitHub
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…
☆57Jul 4, 2023Updated 3 years ago
genggui001 / Megatron-DeepSpeed-Llama
View on GitHub
☆84Sep 9, 2023Updated 2 years ago
CSHaitao / JTR
View on GitHub
The official repo for our SIGIR'23 Full paper: Constructing Tree-based Index for Efficient and Effective Dense Retrieval
☆28Jun 7, 2023Updated 3 years ago
ssbuild / aigc_evals
View on GitHub
aigc evals
☆10Dec 2, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
airaria / AlphaZero_Gomoku_WuZiQi
View on GitHub
My implementation of AlphaZero for gomoku (Wu Zi Qi, 五子棋); Poorman's AlphaZero
☆11Apr 28, 2018Updated 8 years ago
yangzhipeng1108 / DeepSpeed-Chat-ChatGLM
View on GitHub
☆43Dec 15, 2023Updated 2 years ago
casys-kaist / EnvPipe
View on GitHub
☆27Aug 31, 2023Updated 2 years ago
StibiumT16 / Robust-Fine-tuning
View on GitHub
Code for Robust Fine-tuning (RbFT)
☆19Jan 31, 2025Updated last year
GeneZC / MiniMA
View on GitHub
Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"
☆102Jul 9, 2024Updated 2 years ago
oriyor / turning_tables
View on GitHub
Implementation of the paper: "Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning…
☆22Nov 2, 2021Updated 4 years ago
alibaba / Megatron-LLaMA
View on GitHub
Best practice for training LLaMA models in Megatron-LM
☆666Jan 2, 2024Updated 2 years ago
fanshiqing / grouped_gemm
View on GitHub
PyTorch bindings for CUTLASS grouped GEMM.
☆191Apr 8, 2026Updated 3 months ago
SparkJiao / dpo-trajectory-reasoning
View on GitHub
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆84Jan 14, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
jefflai108 / Semi-Supervsied-Spoken-Language-Understanding-PyTorch
View on GitHub
Semi-supervised spoken language understanding (SLU) via self-supervised speech and language model pretraining
☆12Mar 23, 2021Updated 5 years ago
taorui-plus / Chinese-ASR-gitbook
View on GitHub
工业级中文语音识别系统电子书
☆13Oct 30, 2020Updated 5 years ago
bigscience-workshop / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,448Mar 20, 2024Updated 2 years ago
JF-D / Proteus
View on GitHub
☆24Jul 7, 2024Updated 2 years ago
HansiZeng / scaling-retriever
View on GitHub
[SIGIR 2025] The official repo for "Scaling Sparse and Dense Retrieval in Decoder-Only LLMs"
☆22Mar 31, 2025Updated last year
NExTplusplus / L2I
View on GitHub
The baseline method for CCIR 22 https://www.datafountain.cn/competitions/573
☆13Aug 2, 2022Updated 3 years ago
appl-lab / CuTS
View on GitHub
☆13Sep 8, 2021Updated 4 years ago
yangrc1234 / Gomoku-Zero
View on GitHub
A gomoku AI based on Alpha Zero paper.
☆12May 1, 2023Updated 3 years ago
deepspeedai / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,257Aug 14, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hengyicai / ContrastiveLearning4Dialogue
View on GitHub
The codebase for "Group-wise Contrastive Learning for Neural Dialogue Generation" (Cai et al., Findings of EMNLP 2020)
☆55Feb 24, 2021Updated 5 years ago
liucun-zy / Pharos-ESG-A-Hierarchical-ToC-Based-Framework-for-ESG-Report-Parsing
View on GitHub
A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Reports
☆16Nov 14, 2025Updated 8 months ago
YukeWang96 / QGTC_PPoPP22
View on GitHub
Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.
☆30Feb 12, 2022Updated 4 years ago
MachineLearningSystem / 25ASPLOS-Medusa
View on GitHub
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆12Nov 8, 2024Updated last year
snunlp / KR-ELECTRA
View on GitHub
KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch
☆15Feb 13, 2022Updated 4 years ago
liucongg / ZhiHu_Code
View on GitHub
☆24Jun 24, 2020Updated 6 years ago
alibaba / llm-scheduling-artifact
View on GitHub
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Jun 5, 2024Updated 2 years ago
Data-Intelligence-Lab / DEFT-korean-alpaca
View on GitHub
☆23Oct 30, 2023Updated 2 years ago
keezen / ntk_alibi
View on GitHub
NTK scaled version of ALiBi position encoding in Transformer.
☆69Aug 16, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Neutralzz / BiLLa
View on GitHub
BiLLa: A Bilingual LLaMA with Enhanced Reasoning Ability
☆415Jun 1, 2023Updated 3 years ago
git-cloner / llama2-lora-fine-tuning
View on GitHub
llama2 finetuning with deepspeed and lora
☆176Jul 28, 2023Updated 2 years ago
sjtu-epcc / Tacker
View on GitHub
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆33Feb 10, 2025Updated last year
allenai / allennlp-reading-comprehension-research
View on GitHub
☆41Feb 12, 2019Updated 7 years ago
saareliad / FTPipe
View on GitHub
FTPipe and related pipeline model parallelism research.
☆44May 16, 2023Updated 3 years ago
thomasfermi / Dynamic-Coattention-Network-for-SQuAD
View on GitHub
Tensorflow implementation of DCN for question answering on the Stanford Question Answering Dataset (SQuAD)
☆13Dec 1, 2017Updated 8 years ago
sshh12 / Conv-VAD
View on GitHub
A packaged convolutional voice activity detector for noisy environments.
☆14Jun 15, 2019Updated 7 years ago