hkproj / pytorch-transformer-distributedLinks

Distributed training (multi-node) of a Transformer model

☆75

Alternatives and similar repositories for pytorch-transformer-distributed

Users that are interested in pytorch-transformer-distributed are comparing it to the libraries listed below

Sorting:

hkproj / dpo-notes
Notes on Direct Preference Optimization
☆21Updated last year
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆94Updated 4 months ago
hkproj / triton-flash-attention
☆184Updated 7 months ago
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆207Updated last year
melisa-writer / short-transformers
Prune transformer layers
☆69Updated last year
neubig / minllama-assignment
☆90Updated 10 months ago
mingyin0312 / RL4LLM
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆29Updated 5 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆164Updated 4 months ago
hkproj / pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
☆343Updated last year
hkproj / rlhf-ppo
Notes and commented code for RLHF (PPO)
☆101Updated last year
cmu-l3 / anlp-spring2025-code
Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/
☆61Updated 4 months ago
ThinamXx / Meta-llama
Complete implementation of Llama2 with/without KV cache & inference 🚀
☆48Updated last year
hkproj / pytorch-lora
LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch
☆112Updated 2 years ago
hkproj / quantization-notes
Notes on quantization in neural networks
☆95Updated last year
huggingface / picotron_tutorial
☆206Updated 5 months ago
tcapelle / llm_recipes
A set of scripts and notebooks on LLM finetunning and dataset creation
☆110Updated 10 months ago
hkproj / multi-latent-attention
☆43Updated 2 months ago
arpita8 / Awesome-Mixture-of-Experts-Papers
Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.
☆128Updated 11 months ago
aju22 / LLaMA2
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…
☆69Updated last year
CASE-Lab-UMD / LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆174Updated 4 months ago
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆323Updated 3 months ago
hkproj / pytorch-llama-notes
Notes about LLaMA 2 model
☆66Updated last year
facebookresearch / RAM
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
☆263Updated last week
alperiox / Compact-Language-Models-via-Pruning-and-Knowledge-Distillation
Unofficial implementation of https://arxiv.org/pdf/2407.14679
☆48Updated 10 months ago
daniel-furman / sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.
☆77Updated 9 months ago
joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆113Updated 2 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆188Updated 2 months ago
1y33 / 100Days
GPU Kernels
☆191Updated 3 months ago
nyunAI / Faster-LLM-Survey
☆42Updated last year
gpu-mode / profiling-cuda-in-torch
☆162Updated last year