hkproj / pytorch-transformer-distributedLinks
Distributed training (multi-node) of a Transformer model
☆75Updated last year
Alternatives and similar repositories for pytorch-transformer-distributed
Users that are interested in pytorch-transformer-distributed are comparing it to the libraries listed below
Sorting:
- Notes on Direct Preference Optimization☆21Updated last year
- minimal GRPO implementation from scratch☆94Updated 4 months ago
- ☆184Updated 7 months ago
- LoRA and DoRA from Scratch Implementations☆207Updated last year
- Prune transformer layers☆69Updated last year
- ☆90Updated 10 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆29Updated 5 months ago
- An extension of the nanoGPT repository for training small MOE models.☆164Updated 4 months ago
- LLaMA 2 implemented from scratch in PyTorch☆343Updated last year
- Notes and commented code for RLHF (PPO)☆101Updated last year
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆61Updated 4 months ago
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆48Updated last year
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆112Updated 2 years ago
- Notes on quantization in neural networks☆95Updated last year
- ☆206Updated 5 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆110Updated 10 months ago
- ☆43Updated 2 months ago
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆128Updated 11 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆69Updated last year
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆174Updated 4 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆323Updated 3 months ago
- Notes about LLaMA 2 model☆66Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆263Updated last week
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆48Updated 10 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆77Updated 9 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆113Updated 2 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆188Updated 2 months ago
- GPU Kernels☆191Updated 3 months ago
- ☆42Updated last year
- ☆162Updated last year