menloresearch / cortex.tensorrt-llmLinks
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
☆42Updated last year
Alternatives and similar repositories for cortex.tensorrt-llm
Users that are interested in cortex.tensorrt-llm are comparing it to the libraries listed below
Sorting:
- automatically quant GGUF models☆204Updated last week
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆125Updated last year
- ☆102Updated last month
- run ollama & gguf easily with a single command☆52Updated last year
- A fast batching API to serve LLM models☆187Updated last year
- After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…☆53Updated last year
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆28Updated 6 months ago
- 1.58-bit LLaMa model☆82Updated last year
- ☆51Updated 7 months ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆23Updated last year
- A simple experiment on letting two local LLM have a conversation about anything!☆111Updated last year
- ☆51Updated last year
- Train your own small bitnet model☆75Updated 11 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- Gradio based tool to run opensource LLM models directly from Huggingface☆95Updated last year
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆12Updated 4 months ago
- An extension that lets the AI take the wheel, allowing it to use the mouse and keyboard, recognize UI elements, and prompt itself :3...no…☆127Updated 11 months ago
- An OpenAI API compatible LLM inference server based on ExLlamaV2.☆25Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆63Updated last year
- Serving LLMs in the HF-Transformers format via a PyFlask API☆71Updated last year
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.☆26Updated last year
- Experimental LLM Inference UX to aid in creative writing☆122Updated 9 months ago
- ☆116Updated 9 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆106Updated last year
- AI Powered search tool offers content-based, text, and visual similarity system-wide search.☆266Updated 4 months ago
- This small API downloads and exposes access to NeuML's txtai-wikipedia and full wikipedia datasets, taking in a query and returning full …☆100Updated last month
- A pipeline parallel training script for LLMs.☆158Updated 5 months ago
- A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.☆45Updated last year
- Easily view and modify JSON datasets for large language models☆83Updated 4 months ago
- Low-Rank adapter extraction for fine-tuned transformers models☆178Updated last year