NVIDIA / NeMo-Framework-LauncherLinks
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
β507Updated 3 months ago
Alternatives and similar repositories for NeMo-Framework-Launcher
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
Sorting:
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β307Updated 2 months ago
- The Triton TensorRT-LLM Backendβ872Updated this week
- Scalable toolkit for efficient model alignmentβ834Updated last week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β191Updated this week
- β280Updated last week
- A tool to configure, launch and manage your machine learning experiments.β174Updated last week
- Pipeline Parallelism for PyTorchβ775Updated 11 months ago
- Fast Inference Solutions for BLOOMβ563Updated 9 months ago
- Large Context Attentionβ719Updated 6 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,044Updated last month
- β411Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ78Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ266Updated 9 months ago
- Microsoft Automatic Mixed Precision Libraryβ616Updated 10 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,263Updated 5 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,602Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,406Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β870Updated 11 months ago
- Serving multiple LoRA finetuned LLM as oneβ1,078Updated last year
- distributed trainer for LLMsβ578Updated last year
- β120Updated last year
- Zero Bubble Pipeline Parallelismβ411Updated 3 months ago
- Scalable data pre processing and curation toolkit for LLMsβ1,049Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β258Updated last week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β206Updated last week
- Latency and Memory Analysis of Transformer Models for Training and Inferenceβ441Updated 3 months ago
- GPTQ inference Triton kernelβ303Updated 2 years ago
- This repository contains tutorials and examples for Triton Inference Serverβ747Updated 2 weeks ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β814Updated this week
- A throughput-oriented high-performance serving framework for LLMsβ856Updated 3 weeks ago