NVIDIA / NeMo-Framework-Launcher
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
β501Updated 3 weeks ago
Alternatives and similar repositories for NeMo-Framework-Launcher:
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
- Scalable toolkit for efficient model alignmentβ786Updated last week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β186Updated this week
- β253Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β299Updated this week
- A tool to configure, launch and manage your machine learning experiments.β145Updated this week
- Pipeline Parallelism for PyTorchβ764Updated 8 months ago
- The Triton TensorRT-LLM Backendβ832Updated this week
- Microsoft Automatic Mixed Precision Libraryβ595Updated 7 months ago
- β411Updated last year
- Large Context Attentionβ709Updated 3 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blaβ¦β2,400Updated this week
- Serving multiple LoRA finetuned LLM as oneβ1,058Updated last year
- Latency and Memory Analysis of Transformer Models for Training and Inferenceβ409Updated 3 weeks ago
- GPTQ inference Triton kernelβ298Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β818Updated 8 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β194Updated this week
- This repository contains tutorials and examples for Triton Inference Serverβ695Updated 3 weeks ago
- PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.β790Updated 2 months ago
- A library to analyze PyTorch traces.β368Updated last week
- OpenAI compatible API for TensorRT LLM triton backendβ205Updated 9 months ago
- Fast Inference Solutions for BLOOMβ561Updated 7 months ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ687Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ69Updated this week
- Zero Bubble Pipeline Parallelismβ387Updated this week
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,246Updated 2 months ago
- nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculatβ¦β909Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ263Updated 7 months ago
- Reference models for Intel(R) Gaudi(R) AI Acceleratorβ162Updated last week
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Servβ¦β474Updated 2 weeks ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β244Updated this week