A high-throughput and memory-efficient inference and serving engine for LLMs
☆25Mar 5, 2026Updated last month
Alternatives and similar repositories for upstreaming-to-vllm
Users that are interested in upstreaming-to-vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- AWS Neuron Deep Learning Containers (DLCs) are a set of Docker images for training and serving models on AWS Trainium and Inferentia inst…☆21Updated this week
- ☆111Jan 16, 2025Updated last year
- ☆24Mar 30, 2026Updated 2 weeks ago
- This repository features Amazon SageMaker Ground Truth and explains how to ingest raw 3D point cloud data, label it, train a 3D object de…☆13Jun 23, 2022Updated 3 years ago
- Training and inference on AWS Trainium and Inferentia chips.☆263Apr 3, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆64Updated this week
- ☆62Updated this week
- ☆13Dec 19, 2025Updated 3 months ago
- ☆14May 19, 2023Updated 2 years ago
- ☆18Nov 4, 2024Updated last year
- This repository will soon contain all scripts and links to the annotated corpora of Tibetan.☆14Feb 4, 2025Updated last year
- AutoQASM is an experimental module offering a quantum-imperative programming experience in Python for developing quantum programs.☆22Updated this week
- PyTorch implementation of Hinton's FF Algorithm with hard negatives sampling☆15Dec 19, 2022Updated 3 years ago
- An AWS Lambda function that converts any document format that LibreOffice can import to any document format that LibreOffice can export☆25Oct 12, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆12Dec 20, 2025Updated 3 months ago
- ☆14Dec 20, 2023Updated 2 years ago
- ☆26Oct 25, 2023Updated 2 years ago
- Run Haystack Pipelines on Ray☆20Oct 16, 2024Updated last year
- Experimental Managed Delivery plugin to enable deployment of Kubernetes resources via Spinnaker's keel microservice.☆13Apr 11, 2022Updated 4 years ago
- ☆16Jun 6, 2023Updated 2 years ago
- ☆17Apr 9, 2024Updated 2 years ago
- Deploy and manage a self-hosted LLM using EKS.☆17Jan 29, 2025Updated last year
- Amazon Bedrock AI Karaoke is an interactive demonstration of Amazon Bedrock. Users complete the prompt with the microphone and choose the…☆19Jan 29, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆27Updated this week
- This GitHub repository hosts the artifacts for the AWS Containers blog on developing Twelve Factor Apps on ECS using Fargate.☆23May 1, 2023Updated 2 years ago
- vLLM performance dashboard☆44Apr 26, 2024Updated last year
- Amazon ECS Auto Scaling for GPU-based Machine Learning Workloads☆19Jan 29, 2024Updated 2 years ago
- ☆20Apr 24, 2022Updated 3 years ago
- Because it's there.☆16Sep 22, 2024Updated last year
- Question Answering Generative AI application with Large Language Models (LLMs) and Amazon OpenSearch Service☆30Mar 26, 2026Updated 2 weeks ago
- Drug effect prediction using neural network☆25Sep 14, 2020Updated 5 years ago
- Notebooks and sample code for Build On Trainium☆47Jan 14, 2026Updated 3 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Forex Fair Value Gap Indicator for MT5☆13Dec 11, 2024Updated last year
- Data set of Finnish grey literature, containing curated Dublin Core style metadata and links to original PDF publications☆33Mar 27, 2026Updated 2 weeks ago
- ☆26Dec 27, 2023Updated 2 years ago
- ☆14Feb 24, 2023Updated 3 years ago
- Example code for AWS Neuron SDK developers building inference and training applications☆158Apr 2, 2026Updated last week
- Google Cloud の Cloud Run で 架空のWebアプリ Xenn を構築するハンズオン資料です☆12Dec 6, 2024Updated last year
- Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…☆11Mar 3, 2024Updated 2 years ago