AdrianBZG / LLM-distributed-finetune
Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the training on multiple AWS GPU instances
☆53Updated last year
Alternatives and similar repositories for LLM-distributed-finetune:
Users that are interested in LLM-distributed-finetune are comparing it to the libraries listed below
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆108Updated 2 months ago
- ☆114Updated 10 months ago
- experiments with inference on llama☆104Updated 7 months ago
- Benchmark suite for LLMs from Fireworks.ai☆64Updated last month
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆101Updated 7 months ago
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆23Updated 2 weeks ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆182Updated this week
- Ray - A curated list of resources: https://github.com/ray-project/ray☆47Updated last year
- ☆41Updated last month
- Manage scalable open LLM inference endpoints in Slurm clusters☆247Updated 6 months ago
- Materials for learning SGLang☆166Updated last week
- ☆215Updated this week
- ReLM is a Regular Expression engine for Language Models☆103Updated last year
- ☆25Updated last year
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆89Updated this week
- Data preparation code for Amber 7B LLM☆84Updated 8 months ago
- ☆43Updated 6 months ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆107Updated last month
- LLM Serving Performance Evaluation Harness☆65Updated 4 months ago
- ☆150Updated this week
- Experiments on speculative sampling with Llama models☆122Updated last year
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆183Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 4 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆253Updated last year
- Applied AI experiments and examples for PyTorch☆211Updated this week
- The official repo for "LLoCo: Learning Long Contexts Offline"☆114Updated 7 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆204Updated 4 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆66Updated 9 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆215Updated this week
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆88Updated 5 months ago