foundation-model-stack / fm-training-estimatorLinks
Estimate resources needed to train LLMs
☆13Updated 9 months ago
Alternatives and similar repositories for fm-training-estimator
Users that are interested in fm-training-estimator are comparing it to the libraries listed below
Sorting:
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆52Updated this week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆44Updated last week
- Create and deploy virtual-experiments - co-processing computational workflows☆10Updated 4 months ago
- llm-d benchmark scripts and tooling☆33Updated this week
- ☆13Updated last month
- Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.☆236Updated 2 years ago
- ☆51Updated 3 months ago
- Python library for Synthetic Data Generation☆51Updated this week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆62Updated 2 months ago
- 🚀 Collection of libraries used with fms-hf-tuning to accelerate fine-tuning and training of large models.☆13Updated this week
- Community maintained hardware plugin for vLLM on Spyre☆37Updated this week
- An extendible framework for executing benchmarks and computational experiments at scale☆34Updated this week
- A top-like tool for monitoring GPUs in a cluster☆85Updated last year
- A tool to detect infrastructure issues on cloud native AI systems☆52Updated 2 months ago
- This repository contains the results and code for the MLPerf™ Training v4.0 benchmark.☆13Updated last year
- Cloud Native Benchmarking of Foundation Models☆44Updated 4 months ago
- Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)☆21Updated last week
- Prometheus collector and exporter for Slurm cluster metrics. A Slinky project.☆14Updated 3 weeks ago
- Helm charts for llm-d☆50Updated 4 months ago
- Taxonomy tree that will allow you to create models tuned with your data☆287Updated 2 months ago
- An intuitive, easy-to-use python interface for batch resource requesting, access, job submission, and observation. Simplifying the develo…☆32Updated this week
- ☆57Updated last week
- LM engine is a library for pretraining/finetuning LLMs☆77Updated last week
- Cray-HPE System Management Documentation for Shasta, High-Performance-Computing-as-a-Service (HPCaaS).☆31Updated this week
- Bridge operator repo☆21Updated 2 months ago
- This is the open source version of HPL-MXP. The code performance has been verified on Frontier☆18Updated 4 months ago
- MLPerf™ logging library☆37Updated last month
- Queuing and quota management for AI/ML batch jobs on Kubernetes☆14Updated 4 months ago
- ☆267Updated this week
- The project delivers a comprehensive full-stack solution for the Intel® Enterprise AI Foundation on the OpenShift platform to provision I…☆20Updated 4 months ago