menloresearch / cortex.tensorrt-llmLinks

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

☆43

Alternatives and similar repositories for cortex.tensorrt-llm

Users that are interested in cortex.tensorrt-llm are comparing it to the libraries listed below

Sorting:

leafspark / AutoGGUF
automatically quant GGUF models
☆185Updated last week
NVIDIA / trt-llm-as-openai-windows
This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…
☆122Updated last year
chigkim / Ollama-MMLU-Pro
☆95Updated 6 months ago
epolewski / EricLLM
A fast batching API to serve LLM models
☆183Updated last year
the-crypt-keeper / tcurtsni
Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?
☆22Updated last year
kevkid / gguf_gui
☆116Updated 8 months ago
jllllll / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆64Updated last year
Aesthisia / LLMinator
Gradio based tool to run opensource LLM models directly from Huggingface
☆93Updated last year
monk1337 / auto-ollama
run ollama & gguf easily with a single command
☆52Updated last year
rafacelente / bllama
1.58-bit LLaMa model
☆81Updated last year
Fus3n / TwoAI
A simple experiment on letting two local LLM have a conversation about anything!
☆110Updated last year
nuance1979 / llama-server
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
☆127Updated 2 years ago
huseinzol05 / transformers-continuous-batching
Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.
☆26Updated 4 months ago
nyunAI / PruneGPT
☆52Updated last year
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆246Updated last year
abgulati / hf-waitress
Serving LLMs in the HF-Transformers format via a PyFlask API
☆71Updated 10 months ago
nydasco / jen-ai
A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.
☆45Updated last year
countzero / windows_llama.cpp
PowerShell automation to rebuild llama.cpp for a Windows environment.
☆32Updated last month
mounta11n / plusplus-camall
After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…
☆54Updated 10 months ago
beratcmn / local-intelligence
Something similar to Apple Intelligence?
☆61Updated last year
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆173Updated last year
menloresearch / model-converter
☆22Updated last year
mzbac / mlx-llm-server
For inferring and serving local LLMs using the MLX framework
☆104Updated last year
stringandstickytape / MaxsAiStudio
A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.
☆33Updated this week
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated last week
LostRuins / datasetexplorer
Easily view and modify JSON datasets for large language models
☆77Updated 2 months ago
rodrigobaron / anthill
☆24Updated 5 months ago
Codys12 / airllm
AirLLM 70B inference with single 4GB GPU
☆14Updated 2 weeks ago
AstraBert / PrAIvateSearch
Own your AI, search the web with it🌐😎
☆86Updated 6 months ago
cognitivecomputations / kraken
☆66Updated last year