basetenlabs / stablelm-trussLinks

☆13

Alternatives and similar repositories for stablelm-truss

Users that are interested in stablelm-truss are comparing it to the libraries listed below

Sorting:

basetenlabs / starcoder-truss
Truss for deploying Starcoder to Baseten or other platforms
☆12Updated last year
basetenlabs / truss
The simplest way to serve AI/ML models in production
☆994Updated last week
dstackai / dstack
dstack is an open-source alternative to Kubernetes and Slurm, designed to simplify GPU allocation and AI workload orchestration for ML te…
☆1,797Updated this week
featureform / featureform
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
☆1,901Updated 3 weeks ago
ai-dynamo / dynamo
A Datacenter Scale Distributed Inference Serving Framework
☆4,197Updated this week
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,062Updated last year
lexy-ai / lexy
Data pipelines for AI applications
☆12Updated last week
predibase / lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
☆2,996Updated 2 weeks ago
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆2,088Updated this week
tigr81 / tigr81
A Project Scaffolder for HUMANS.
☆10Updated 7 months ago
flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
☆3,123Updated this week
ray-project / ray-llm
RayLLM - LLMs on Ray (Archived). Read README for more info.
☆1,261Updated 2 months ago
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆796Updated 3 months ago
sematic-ai / sematic
An open-source ML pipeline development platform
☆988Updated 5 months ago
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,830Updated last year
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆236Updated this week
efeslab / Nanoflow
A throughput-oriented high-performance serving framework for LLMs
☆815Updated 3 weeks ago
flexflow / flexflow-train
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,799Updated this week
turboderp-org / exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆4,202Updated this week
RunLLM / aqueduct
Aqueduct is no longer being maintained. Aqueduct allows you to run LLM and ML workloads on any cloud infrastructure.
☆520Updated 2 years ago
LMCache / LMCache
Redis for LLMs
☆1,243Updated this week
felafax / felafax
Felafax is building AI infra for non-NVIDIA GPUs
☆560Updated 4 months ago
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,019Updated 2 months ago
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,396Updated last week
IST-DASLab / marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆836Updated 9 months ago
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆845Updated this week
modal-labs / modal-client
Python client library for Modal
☆350Updated this week
Cornell-RelaxML / quip-sharp
☆539Updated 7 months ago
piercefreeman / vectordb-orm
An ORM library for vector databases
☆16Updated 2 years ago
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,170Updated 8 months ago