NVIDIA / NeMo-Framework-LauncherLinks
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
β505Updated 2 months ago
Alternatives and similar repositories for NeMo-Framework-Launcher
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
Sorting:
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β305Updated last month
- The Triton TensorRT-LLM Backendβ859Updated last week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β190Updated this week
- A tool to configure, launch and manage your machine learning experiments.β171Updated this week
- β271Updated last month
- Pipeline Parallelism for PyTorchβ769Updated 10 months ago
- β411Updated last year
- Scalable toolkit for efficient model alignmentβ825Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ77Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,548Updated this week
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,259Updated 4 months ago
- Large Context Attentionβ718Updated 5 months ago
- GPTQ inference Triton kernelβ302Updated 2 years ago
- Fast Inference Solutions for BLOOMβ564Updated 9 months ago
- Serving multiple LoRA finetuned LLM as oneβ1,073Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β855Updated 10 months ago
- Microsoft Automatic Mixed Precision Libraryβ612Updated 9 months ago
- This repository contains tutorials and examples for Triton Inference Serverβ732Updated last month
- Scalable data pre processing and curation toolkit for LLMsβ1,019Updated last week
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,031Updated 2 weeks ago
- distributed trainer for LLMsβ578Updated last year
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β205Updated last week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β255Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β354Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMsβ264Updated 9 months ago
- Easy and Efficient Quantization for Transformersβ198Updated 3 weeks ago
- A family of compressed models obtained via pruning and knowledge distillationβ344Updated 8 months ago
- Zero Bubble Pipeline Parallelismβ406Updated 2 months ago
- A throughput-oriented high-performance serving framework for LLMsβ840Updated last week
- Latency and Memory Analysis of Transformer Models for Training and Inferenceβ434Updated 2 months ago