NVIDIA / NeMo-Framework-Launcher
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
☆490Updated 2 weeks ago
Alternatives and similar repositories for NeMo-Framework-Launcher:
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
- Scalable toolkit for efficient model alignment☆715Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆286Updated 2 weeks ago
- The Triton TensorRT-LLM Backend☆774Updated this week
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,191Updated 4 months ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆708Updated 5 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆2,167Updated this week
- Minimalistic large language model 3D-parallelism training☆1,445Updated this week
- Large Context Attention☆681Updated 3 weeks ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆675Updated 6 months ago
- Fast Inference Solutions for BLOOM☆563Updated 4 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,364Updated 10 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆1,960Updated last week
- LLMPerf is a library for validating and benchmarking LLMs☆735Updated 2 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,787Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,025Updated 9 months ago
- ☆410Updated last year
- A throughput-oriented high-performance serving framework for LLMs☆733Updated 4 months ago
- Scalable data pre processing and curation toolkit for LLMs☆783Updated this week
- Microsoft Automatic Mixed Precision Library☆564Updated 4 months ago
- A tool to configure, launch and manage your machine learning experiments.☆117Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,980Updated last week
- ☆223Updated this week
- This repository contains tutorials and examples for Triton Inference Server☆642Updated this week
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,335Updated 7 months ago
- distributed trainer for LLMs☆557Updated 8 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆191Updated 6 months ago
- Pipeline Parallelism for PyTorch☆746Updated 5 months ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆169Updated this week
- ☆448Updated last year
- GPTQ inference Triton kernel☆295Updated last year