menloresearch / cortex.tensorrt-llmLinks
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
☆43Updated 9 months ago
Alternatives and similar repositories for cortex.tensorrt-llm
Users that are interested in cortex.tensorrt-llm are comparing it to the libraries listed below
Sorting:
- automatically quant GGUF models☆185Updated last week
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆122Updated last year
- ☆95Updated 6 months ago
- A fast batching API to serve LLM models☆183Updated last year
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆22Updated last year
- ☆116Updated 8 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆64Updated last year
- Gradio based tool to run opensource LLM models directly from Huggingface☆93Updated last year
- run ollama & gguf easily with a single command☆52Updated last year
- 1.58-bit LLaMa model☆81Updated last year
- A simple experiment on letting two local LLM have a conversation about anything!☆110Updated last year
- LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.☆127Updated 2 years ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆26Updated 4 months ago
- ☆52Updated last year
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- Serving LLMs in the HF-Transformers format via a PyFlask API☆71Updated 10 months ago
- A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.☆45Updated last year
- PowerShell automation to rebuild llama.cpp for a Windows environment.☆32Updated last month
- After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…☆54Updated 10 months ago
- Something similar to Apple Intelligence?☆61Updated last year
- Low-Rank adapter extraction for fine-tuned transformers models☆173Updated last year
- ☆22Updated last year
- For inferring and serving local LLMs using the MLX framework☆104Updated last year
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆33Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated last week
- Easily view and modify JSON datasets for large language models☆77Updated 2 months ago
- ☆24Updated 5 months ago
- AirLLM 70B inference with single 4GB GPU☆14Updated 2 weeks ago
- Own your AI, search the web with it🌐😎☆86Updated 6 months ago
- ☆66Updated last year