AdrianBZG / LLM-distributed-finetune
Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the training on multiple AWS GPU instances
☆50Updated last year
Related projects: ⓘ
- experiments with inference on llama☆106Updated 3 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆95Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆51Updated this week
- ☆42Updated this week
- ☆108Updated 6 months ago
- ring-attention experiments☆89Updated 5 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆86Updated 3 months ago
- ☆170Updated this week
- Experiments on speculative sampling with Llama models☆114Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆218Updated 5 months ago
- ☆32Updated this week
- Manage scalable open LLM inference endpoints in Slurm clusters☆217Updated 2 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆248Updated 10 months ago
- batched loras☆327Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆150Updated last week
- ReLM is a Regular Expression engine for Language Models☆100Updated last year
- The official repo for "LLoCo: Learning Long Contexts Offline"☆104Updated 3 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆129Updated last month
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆55Updated this week
- Fast Inference of MoE Models with CPU-GPU Orchestration☆163Updated 3 months ago
- ☆26Updated last year
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆416Updated this week
- ☆174Updated 4 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆89Updated last week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆73Updated last month
- ☆35Updated 2 months ago
- LLM Serving Performance Evaluation Harness☆45Updated 3 weeks ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆57Updated 5 months ago
- ☆27Updated last month
- Benchmark baseline for retrieval qa applications☆90Updated 5 months ago