deepjavalibrary / djl-servingLinks

A universal scalable machine learning model deployment solution

☆241

Alternatives and similar repositories for djl-serving

Users that are interested in djl-serving are comparing it to the libraries listed below

Sorting:

deepjavalibrary / djl-demo
Demo applications showcasing DJL
☆342Updated last week
aws-neuron / transformers-neuronx
☆111Updated 10 months ago
huggingface / optimum-neuron
Training and inference on AWS Trainium and Inferentia chips.
☆250Updated last week
aws-neuron / upstreaming-to-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆23Updated last month
triton-inference-server / vllm_backend
☆319Updated last week
aws-neuron / aws-neuron-samples
Example code for AWS Neuron SDK developers building inference and training applications
☆151Updated this week
triton-inference-server / python_backend
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
☆660Updated last week
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆910Updated last week
aws / sagemaker-huggingface-inference-toolkit
☆270Updated 7 months ago
ray-project / llmperf
LLMPerf is a library for validating and benchmarking LLMs
☆1,057Updated 11 months ago
triton-inference-server / client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
☆665Updated last week
awslabs / llm-hosting-container
Large Language Model Hosting Container
☆90Updated last month
ray-project / langchain-ray
Examples on how to use LangChain and Ray
☆232Updated 2 years ago
opensearch-project / k-NN
🆕 Find the k-nearest neighbors (k-NN) for your vector data
☆203Updated this week
aws-neuron / neuronx-distributed
☆63Updated 2 weeks ago
aws-samples / sagemaker-distributed-training-workshop
Hands-on workshop for distributed training and hosting on SageMaker
☆151Updated last month
triton-inference-server / tutorials
This repository contains tutorials and examples for Triton Inference Server
☆805Updated 3 weeks ago
deepjavalibrary / d2l-java
The Java implementation of Dive into Deep Learning (D2L.ai)
☆190Updated last week
triton-inference-server / model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Serv…
☆500Updated this week
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
aws-neuron / aws-neuron-sdk
Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and i…
☆557Updated this week
triton-inference-server / model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
☆213Updated 7 months ago
aws-samples / aws-samples-for-ray
☆73Updated last year
aws / sagemaker-pytorch-inference-toolkit
Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at h…
☆141Updated last year
triton-inference-server / backend
Common source, scripts and utilities for creating Triton backends.
☆361Updated 3 weeks ago
run-llama / modal_finetune_sql
☆320Updated 2 years ago
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆830Updated 3 months ago
aws / fmeval
Foundation Model Evaluations Library
☆272Updated 4 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Updated 2 months ago
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆168Updated this week