NVIDIA / NeMo-Framework-LauncherLinks
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
β510Updated 9 months ago
Alternatives and similar repositories for NeMo-Framework-Launcher
Users that are interested in NeMo-Framework-Launcher are comparing it to the libraries listed below
Sorting:
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β327Updated 4 months ago
- The Triton TensorRT-LLM Backendβ918Updated this week
- Scalable toolkit for efficient model alignmentβ852Updated 4 months ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β205Updated this week
- Pipeline Parallelism for PyTorchβ784Updated last year
- Large Context Attentionβ766Updated 3 months ago
- A tool to configure, launch and manage your machine learning experiments.β216Updated this week
- Fast Inference Solutions for BLOOMβ566Updated last year
- β413Updated 2 years ago
- β328Updated this week
- Serving multiple LoRA finetuned LLM as oneβ1,140Updated last year
- GPTQ inference Triton kernelβ321Updated 2 years ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decodingβ1,315Updated 11 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,092Updated 7 months ago
- Microsoft Automatic Mixed Precision Libraryβ635Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ85Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β1,431Updated last year
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β219Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β404Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated 2 months ago
- distributed trainer for LLMsβ588Updated last year
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β279Updated 2 months ago
- A throughput-oriented high-performance serving framework for LLMsβ945Updated 3 months ago
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.β1,008Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.β1,005Updated last year
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β255Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hβ¦β3,132Updated this week
- Latency and Memory Analysis of Transformer Models for Training and Inferenceβ478Updated 9 months ago
- Zero Bubble Pipeline Parallelismβ449Updated 9 months ago
- batched lorasβ349Updated 2 years ago