NVIDIA / NeMo-Framework-LauncherLinks

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

☆509

Alternatives and similar repositories for NeMo-Framework-Launcher

Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below

Sorting:

huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆910Updated last week
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆201Updated this week
pytorch / PiPPy
Pipeline Parallelism for PyTorch
☆784Updated last year
triton-inference-server / vllm_backend
☆319Updated last week
NVIDIA / NeMo-Aligner
Scalable toolkit for efficient model alignment
☆847Updated 2 months ago
triton-inference-server / fastertransformer_backend
☆413Updated 2 years ago
Azure / MS-AMP
Microsoft Automatic Mixed Precision Library
☆628Updated this week
haoliuhl / ringattention
Large Context Attention
☆753Updated last month
huggingface / transformers-bloom-inference
Fast Inference Solutions for BLOOM
☆564Updated last year
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆209Updated this week
triton-inference-server / tutorials
This repository contains tutorials and examples for Triton Inference Server
☆805Updated 3 weeks ago
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,307Updated 9 months ago
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,122Updated last year
HabanaAI / vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
☆85Updated this week
efeslab / Nanoflow
A throughput-oriented high-performance serving framework for LLMs
☆921Updated last month
IST-DASLab / marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆958Updated last year
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,080Updated 5 months ago
HabanaAI / Model-References
Reference models for Intel(R) Gaudi(R) AI Accelerator
☆169Updated 2 months ago
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆2,971Updated this week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆710Updated last year
fpgaminer / GPTQ-triton
GPTQ inference Triton kernel
☆316Updated 2 years ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆203Updated 5 months ago
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆500Updated this week
huggingface / llm_training_handbook
An open collection of methodologies to help with successful training of large language models.
☆541Updated last year
bigscience-workshop / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,426Updated last year
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆830Updated 3 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆216Updated last week
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆271Updated last week