NVIDIA / trt-llm-as-openai-windows
This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows instead of cloud.
β120Updated last year
Alternatives and similar repositories for trt-llm-as-openai-windows:
Users that are interested in trt-llm-as-openai-windows are comparing it to the libraries listed below
- β113Updated 2 weeks ago
- β66Updated 10 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β136Updated 8 months ago
- β53Updated 10 months ago
- β199Updated last year
- A collection of all available inference solutions for the LLMsβ86Updated last month
- experiments with inference on llamaβ104Updated 10 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'β235Updated 10 months ago
- Low-Rank adapter extraction for fine-tuned transformers modelsβ171Updated 11 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ266Updated this week
- OpenAI compatible API for TensorRT LLM triton backendβ205Updated 8 months ago
- β75Updated last year
- Comparison of Language Model Inference Enginesβ214Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 6 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsβ86Updated this week
- GPT-4 Level Conversational QA Trained In a Few Hoursβ59Updated 8 months ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answeβ¦β149Updated last year
- β112Updated 4 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.β42Updated 9 months ago
- A pipeline parallel training script for LLMs.β137Updated 3 weeks ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytesβ¦β147Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β66Updated 5 months ago
- Set of scripts to finetune LLMsβ37Updated last year
- Data preparation code for Amber 7B LLMβ88Updated 11 months ago
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hubβ160Updated last year
- β73Updated last year
- β153Updated 9 months ago
- Inference server benchmarking toolβ53Updated 3 weeks ago
- The NVIDIA RTXβ’ AI Toolkit is a suite of tools and SDKs for Windows developers to customize, optimize, and deploy AI models across RTX PCβ¦β149Updated 5 months ago
- β100Updated 7 months ago