AdrianBZG / LLM-distributed-finetuneLinks

Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the training on multiple AWS GPU instances

☆59

Alternatives and similar repositories for LLM-distributed-finetune

Users that are interested in LLM-distributed-finetune are comparing it to the libraries listed below

Sorting:

intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Updated 2 months ago
hamelsmu / llama-inference
experiments with inference on llama
☆103Updated last year
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated last week
run-ai / llmperf
☆58Updated last year
llm-efficiency-challenge / neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
☆258Updated 2 years ago
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆257Updated this week
JiahaoYao / awesome-ray
Ray - A curated list of resources: https://github.com/ray-project/ray
☆73Updated last month
sabetAI / BLoRA
batched loras
☆347Updated 2 years ago
bentoml / llm-bench
☆56Updated last year
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆277Updated last year
mkuchnik / relm
ReLM is a Regular Expression engine for Language Models
☆107Updated 2 years ago
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆139Updated last year
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆327Updated this week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
anyscale / llm-continuous-batching-benchmarks
☆122Updated last year
sangmichaelxie / cs324_p2
Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)
☆105Updated 2 years ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆127Updated 2 years ago
ray-project / llmperf-leaderboard
☆473Updated last year
skypilot-org / skypilot-tutorial
Tutorial to get started with SkyPilot!
☆58Updated last year
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆151Updated last year
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆82Updated 9 months ago
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆93Updated last year
allenai / fm-cheatsheet
Website for hosting the Open Foundation Models Cheat Sheet.
☆269Updated 7 months ago
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆140Updated this week
jlscheerer / xtr-warp
XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.
☆173Updated 7 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆278Updated 2 years ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆93Updated last week
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆237Updated 11 months ago
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆265Updated this week