A high-throughput and memory-efficient inference and serving engine for LLMs
☆26Jun 9, 2026Updated 3 weeks ago
Alternatives and similar repositories for upstreaming-to-vllm
Users that are interested in upstreaming-to-vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Jun 25, 2026Updated last week
- ☆38May 14, 2026Updated last month
- AWS Neuron Deep Learning Containers (DLCs) are a set of Docker images for training and serving models on AWS Trainium and Inferentia inst…☆22Jun 19, 2026Updated 2 weeks ago
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆21May 29, 2025Updated last year
- ☆111Jan 16, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆24Jun 4, 2026Updated last month
- This repository features Amazon SageMaker Ground Truth and explains how to ingest raw 3D point cloud data, label it, train a 3D object de…☆13Jun 23, 2022Updated 4 years ago
- This web based application enables developers to quickly unit test individual API calls for both Incapsula and SecureSphere, as well as p…☆21Sep 12, 2023Updated 2 years ago
- Training and inference on AWS Trainium and Inferentia chips.☆267Jun 15, 2026Updated 2 weeks ago
- ☆12Mar 16, 2026Updated 3 months ago
- ☆66Apr 9, 2026Updated 2 months ago
- Smart commit messages☆18Oct 25, 2024Updated last year
- ☆13Dec 19, 2025Updated 6 months ago
- ☆18May 7, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- AutoQASM is an experimental module offering a quantum-imperative programming experience in Python for developing quantum programs.☆22Jun 23, 2026Updated last week
- Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores (EuroSys'25)☆15Jul 17, 2025Updated 11 months ago
- PyTorch implementation of Hinton's FF Algorithm with hard negatives sampling☆15Dec 19, 2022Updated 3 years ago
- Comprehensive, scalable ML inference architecture using Amazon EKS, leveraging Graviton processors for cost-effective CPU-based inference…☆19Mar 12, 2026Updated 3 months ago
- ☆27Oct 25, 2023Updated 2 years ago
- Run Haystack Pipelines on Ray☆20Oct 16, 2024Updated last year
- ☆16Jun 6, 2023Updated 3 years ago
- ☆17Apr 9, 2024Updated 2 years ago
- Amazon Bedrock AI Karaoke is an interactive demonstration of Amazon Bedrock. Users complete the prompt with the microphone and choose the…☆19Jan 29, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- An example Terraform repo that utilizes the upstream EKS blueprints project from AWS Integration and Automation.☆14May 11, 2022Updated 4 years ago
- ☆17Jun 10, 2026Updated 3 weeks ago
- Amazon ECS Auto Scaling for GPU-based Machine Learning Workloads☆19Jan 29, 2024Updated 2 years ago
- Drug effect prediction using neural network☆26Sep 14, 2020Updated 5 years ago
- Simple and easy stable diffusion inference with LightningModule on GPU, CPU and MPS (Possibly all devices supported by Lightning).☆16Jul 27, 2023Updated 2 years ago
- ☆26Dec 27, 2023Updated 2 years ago
- Terraform module for creating EKS clusters optimized for ClickHouse® with EBS and autoscaling ☁️☆27Mar 25, 2026Updated 3 months ago
- This repository aims to showcase how to finetune a FM model in Amazon EKS cluster using, JupyterHub to provision notebooks and craft both…☆53Jun 17, 2025Updated last year
- ☆14Feb 24, 2023Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Integrating SSE with NVIDIA Triton Inference Server using a Python backend and Zephyr model. There is very less documentation how to use …☆10May 29, 2024Updated 2 years ago
- ☆26Mar 15, 2024Updated 2 years ago
- Backstage plugin for Argo Workflows☆21Oct 3, 2023Updated 2 years ago
- Example code for AWS Neuron SDK developers building inference and training applications☆161May 20, 2026Updated last month
- Helm Chart for deploying Spark history server in Amazon EKS for S3 Spark Event Logs☆29Apr 4, 2026Updated 3 months ago
- Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…☆11Mar 3, 2024Updated 2 years ago
- ☆74Jun 26, 2024Updated 2 years ago