NVIDIA / NeMo-Framework-LauncherLinks
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
β508Updated 8 months ago
Alternatives and similar repositories for NeMo-Framework-Launcher
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
Sorting:
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β326Updated 3 months ago
- The Triton TensorRT-LLM Backendβ914Updated this week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β204Updated this week
- Scalable toolkit for efficient model alignmentβ847Updated 3 months ago
- β324Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ86Updated this week
- β412Updated 2 years ago
- Fast Inference Solutions for BLOOMβ566Updated last year
- Large Context Attentionβ762Updated 3 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,314Updated 10 months ago
- Microsoft Automatic Mixed Precision Libraryβ635Updated last month
- A tool to configure, launch and manage your machine learning experiments.β214Updated this week
- Serving multiple LoRA finetuned LLM as oneβ1,134Updated last year
- Pipeline Parallelism for PyTorchβ784Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β982Updated last year
- A throughput-oriented high-performance serving framework for LLMsβ937Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last month
- GPTQ inference Triton kernelβ316Updated 2 years ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,081Updated last week
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ713Updated last year
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ528Updated this week
- Easy and Efficient Quantization for Transformersβ202Updated 6 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,428Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,088Updated 6 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inferenceβ475Updated 8 months ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β833Updated 5 months ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Servβ¦β502Updated this week
- This repository contains tutorials and examples for Triton Inference Serverβ813Updated this week
- Zero Bubble Pipeline Parallelismβ447Updated 8 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β218Updated this week