NVIDIA / trt-llm-as-openai-windowsLinks
This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows instead of cloud.
β127Updated last year
Alternatives and similar repositories for trt-llm-as-openai-windows
Users that are interested in trt-llm-as-openai-windows are comparing it to the libraries listed below
Sorting:
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hubβ161Updated 2 years ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β139Updated last year
- A pipeline parallel training script for LLMs.β165Updated 8 months ago
- A collection of all available inference solutions for the LLMsβ94Updated 10 months ago
- automatically quant GGUF modelsβ219Updated 2 weeks ago
- β165Updated 5 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'β240Updated last year
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytesβ¦β146Updated 2 years ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.β64Updated 2 years ago
- run ollama & gguf easily with a single commandβ52Updated last year
- Low-Rank adapter extraction for fine-tuned transformers modelsβ180Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hoursβ66Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first appβ¦β169Updated last year
- The NVIDIA RTXβ’ AI Toolkit is a suite of tools and SDKs for Windows developers to customize, optimize, and deploy AI models across RTX PCβ¦β180Updated last month
- β108Updated 4 months ago
- β119Updated last year
- β68Updated last year
- An innovative library for efficient LLM inference via low-bit quantizationβ351Updated last year
- β206Updated last year
- β198Updated last year
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answeβ¦β157Updated last year
- Let's create synthetic textbooks together :)β75Updated last year
- β101Updated last year
- β138Updated 4 months ago
- β51Updated last year
- FineTune LLMs in few lines of code (Text2Text, Text2Speech, Speech2Text)β246Updated last year
- Maybe the new state of the art vision model? we'll see π€·ββοΈβ170Updated 2 years ago
- An unsupervised model merging algorithm for Transformers-based language models.β108Updated last year
- OpenAI compatible API for TensorRT LLM triton backendβ218Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β201Updated last year