NVIDIA / NeMo-Framework-LauncherLinks
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
β510Updated 9 months ago
Alternatives and similar repositories for NeMo-Framework-Launcher
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
Sorting:
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β327Updated 4 months ago
- The Triton TensorRT-LLM Backendβ918Updated this week
- Scalable toolkit for efficient model alignmentβ852Updated 4 months ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β205Updated this week
- Pipeline Parallelism for PyTorchβ784Updated last year
- β328Updated this week
- Microsoft Automatic Mixed Precision Libraryβ635Updated 2 months ago
- Large Context Attentionβ766Updated 3 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,316Updated 11 months ago
- β413Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ85Updated this week
- Fast Inference Solutions for BLOOMβ566Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,092Updated 7 months ago
- A tool to configure, launch and manage your machine learning experiments.β216Updated this week
- Serving multiple LoRA finetuned LLM as oneβ1,140Updated last year
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,132Updated this week
- A throughput-oriented high-performance serving framework for LLMsβ945Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated 2 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β1,005Updated last year
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β404Updated last month
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Servβ¦β504Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β255Updated last week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,431Updated last year
- This repository contains tutorials and examples for Triton Inference Serverβ819Updated this week
- Zero Bubble Pipeline Parallelismβ449Updated 9 months ago
- Easy and Efficient Quantization for Transformersβ204Updated last week
- π€ Optimum Intel: Accelerate inference with Intel optimization toolsβ532Updated this week
- β125Updated last year
- GPTQ inference Triton kernelβ321Updated 2 years ago
- distributed trainer for LLMsβ588Updated last year