foundation-model-stack / fm-training-estimatorLinks
Estimate resources needed to train LLMs
β13Updated last week
Alternatives and similar repositories for fm-training-estimator
Users that are interested in fm-training-estimator are comparing it to the libraries listed below
Sorting:
- π Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.β54Updated this week
- Create and deploy virtual-experiments - co-processing computational workflowsβ10Updated 5 months ago
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Dataβ44Updated this week
- Python library for Synthetic Data Generationβ51Updated 2 weeks ago
- β13Updated 2 months ago
- llm-d benchmark scripts and toolingβ39Updated this week
- Cloud Native Benchmarking of Foundation Modelsβ44Updated 4 months ago
- Bridge operator repoβ21Updated 3 months ago
- β51Updated 4 months ago
- A tool to detect infrastructure issues on cloud native AI systemsβ52Updated 3 months ago
- GitHub bot to assist with the taxonomy contribution workflowβ17Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inferenceβ62Updated 3 months ago
- Community maintained hardware plugin for vLLM on Spyreβ37Updated this week
- A top-like tool for monitoring GPUs in a clusterβ85Updated last year
- Simplifying the definition and execution, scaling and deployment of pipelines on the cloud.β235Updated 2 years ago
- Taxonomy tree that will allow you to create models tuned with your dataβ287Updated 3 months ago
- Example ML projects that use the Determined library.β32Updated last year
- How to build an ACP compliant agent that uses MCP as well!β11Updated 7 months ago
- Prometheus collector and exporter for Slurm cluster metrics. A Slinky project.β14Updated last month
- Integrations between commercial and open source applications and LSF published by IBM and others.β18Updated last year
- β23Updated 3 years ago
- Trusted Service Identity is closing the gap of preventing access to secrets by an untrusted operator during the process of obtaining authβ¦β27Updated 3 months ago
- Controller for ModelMeshβ242Updated 6 months ago
- Python library for Evaluationβ16Updated last week
- Caikit is an AI toolkit that enables users to manage models through a set of developer friendly APIs.β110Updated last month
- β273Updated this week
- Module, Model, and Tensor Serialization/Deserializationβ279Updated 4 months ago
- Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)β25Updated this week
- IBM Spectrum LSF - IBM Cloudβ15Updated last year
- This repository contains the results and code for the MLPerfβ’ Training v4.0 benchmark.β13Updated last year