AdrianBZG / LLM-distributed-finetune
Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the training on multiple AWS GPU instances
☆53Updated last year
Related projects ⓘ
Alternatives and complementary repositories for LLM-distributed-finetune
- ☆111Updated 8 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆103Updated last week
- experiments with inference on llama☆105Updated 5 months ago
- ☆193Updated this week
- ☆36Updated this week
- ☆47Updated 2 months ago
- Modular and structured prompt caching for low-latency LLM inference☆69Updated 2 weeks ago
- Efficient, Flexible and Portable Structured Generation☆125Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆166Updated this week
- Applied AI experiments and examples for PyTorch☆168Updated 3 weeks ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆238Updated 4 months ago
- ☆38Updated 4 months ago
- Materials for learning SGLang☆110Updated this week
- ring-attention experiments☆97Updated last month
- Benchmark suite for LLMs from Fireworks.ai☆59Updated 2 weeks ago
- Packages and instructions for training and inference of LLMs on NVIDIA's new GH200 machines☆19Updated 2 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆94Updated 5 months ago
- Fast Inference of MoE Models with CPU-GPU Orchestration☆173Updated last week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆87Updated 3 months ago
- ☆158Updated last month
- ReLM is a Regular Expression engine for Language Models☆104Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- Ray - A curated list of resources: https://github.com/ray-project/ray☆42Updated last year
- LLM Serving Performance Evaluation Harness☆57Updated 2 months ago
- batched loras☆336Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆262Updated last year
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆134Updated 3 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆252Updated last year
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆79Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆194Updated this week