NVIDIA / NeMo-Framework-LauncherLinks
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
☆509Updated 8 months ago
Alternatives and similar repositories for NeMo-Framework-Launcher
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
Sorting:
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆325Updated 3 months ago
- The Triton TensorRT-LLM Backend☆909Updated last week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆203Updated this week
- Scalable toolkit for efficient model alignment☆847Updated 2 months ago
- Fast Inference Solutions for BLOOM☆565Updated last year
- ☆321Updated last week
- ☆413Updated 2 years ago
- Microsoft Automatic Mixed Precision Library☆634Updated 3 weeks ago
- Large Context Attention☆757Updated 2 months ago
- Serving multiple LoRA finetuned LLM as one☆1,128Updated last year
- Pipeline Parallelism for PyTorch☆783Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,310Updated 9 months ago
- A tool to configure, launch and manage your machine learning experiments.☆212Updated this week
- GPTQ inference Triton kernel☆317Updated 2 years ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,085Updated 5 months ago
- A throughput-oriented high-performance serving framework for LLMs☆926Updated last month
- Easy and Efficient Quantization for Transformers☆202Updated 6 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆467Updated 8 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆962Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 3 weeks ago
- distributed trainer for LLMs☆587Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆85Updated this week
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆710Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,426Updated last year
- ☆122Updated last year
- This repository contains tutorials and examples for Triton Inference Server☆814Updated 2 weeks ago
- OpenAI compatible API for TensorRT LLM triton backend☆218Updated last year
- ☆206Updated 7 months ago
- Zero Bubble Pipeline Parallelism☆443Updated 7 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆216Updated 2 weeks ago