NVIDIA / NeMo-Framework-LauncherLinks
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
☆504Updated 2 months ago
Alternatives and similar repositories for NeMo-Framework-Launcher
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
Sorting:
- Scalable toolkit for efficient model alignment☆814Updated 3 weeks ago
- Large Context Attention☆716Updated 5 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆304Updated 3 weeks ago
- Microsoft Automatic Mixed Precision Library☆610Updated 8 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆188Updated this week
- The Triton TensorRT-LLM Backend☆851Updated last week
- Pipeline Parallelism for PyTorch☆768Updated 10 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,507Updated last week
- Serving multiple LoRA finetuned LLM as one☆1,066Updated last year
- A tool to configure, launch and manage your machine learning experiments.☆161Updated this week
- ☆267Updated 2 weeks ago
- Zero Bubble Pipeline Parallelism☆398Updated last month
- ☆411Updated last year
- Fast Inference Solutions for BLOOM☆564Updated 8 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,397Updated last year
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆349Updated 2 weeks ago
- A throughput-oriented high-performance serving framework for LLMs☆825Updated 3 weeks ago
- ☆250Updated 11 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆179Updated 2 weeks ago
- Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…☆477Updated 2 weeks ago
- This repository contains tutorials and examples for Triton Inference Server☆724Updated 2 weeks ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆431Updated 2 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆521Updated last month
- Ring attention implementation with flash attention☆789Updated 2 weeks ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,258Updated 3 months ago
- A PyTorch Native LLM Training Framework☆821Updated 5 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆204Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆129Updated last month
- OpenAI compatible API for TensorRT LLM triton backend☆209Updated 10 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆846Updated 9 months ago