NVIDIA / trt-llm-as-openai-windows
This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows instead of cloud.
☆111Updated 6 months ago
Related projects: ⓘ
- Low-Rank adapter extraction for fine-tuned transformers model☆154Updated 4 months ago
- A pipeline parallel training script for LLMs.☆79Updated last month
- A fast batching API to serve LLM models☆172Updated 4 months ago
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆154Updated 11 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆229Updated 3 months ago
- ☆144Updated 2 months ago
- ☆101Updated 6 months ago
- ☆64Updated 3 months ago
- automatically quant GGUF models☆119Updated this week
- Banishing LLM Hallucinations Requires Rethinking Generalization☆253Updated 2 months ago
- An Open Source Toolkit For LLM Distillation☆284Updated last month
- ☆82Updated 3 weeks ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 2 months ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆139Updated 7 months ago
- ☆75Updated 3 weeks ago
- Gradio based tool to run opensource LLM models directly from Huggingface☆84Updated 2 months ago
- The NVIDIA RTX™ AI Toolkit is a suite of tools and SDKs for Windows developers to customize, optimize, and deploy AI models across RTX PC…☆95Updated 3 weeks ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆223Updated 4 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆158Updated 2 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆128Updated this week
- Simple and fast server for GPTQ-quantized LLaMA inference☆24Updated last year
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆237Updated last week
- A collection of all available inference solutions for the LLMs☆65Updated 2 weeks ago
- An unsupervised model merging algorithm for Transformers-based language models.☆96Updated 4 months ago
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆388Updated 3 weeks ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆53Updated 3 weeks ago
- Comparison of Language Model Inference Engines☆178Updated 2 weeks ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆36Updated 2 weeks ago
- Tutorial for building LLM router☆145Updated 2 months ago
- A memory framework for Large Language Models and Agents.☆157Updated last month