janhq / cortex.tensorrt-llm

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

☆42

Alternatives and similar repositories for cortex.tensorrt-llm:

Users that are interested in cortex.tensorrt-llm are comparing it to the libraries listed below

the-crypt-keeper / tcurtsni
Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?
☆21Updated 8 months ago
beyondExp / B-Llama3-o
B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.
☆26Updated 9 months ago
pepijndevos / llama_multiserver
A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM
☆12Updated 2 months ago
chigkim / Ollama-MMLU-Pro
☆79Updated 2 months ago
rodrigobaron / anthill
☆24Updated last month
agokrani / distillKitPlus
Easy to use, High Performant Knowledge Distillation for LLMs
☆50Updated last month
EdwardDali / EntropixLab
entropix style sampling + GUI
☆25Updated 4 months ago
epolewski / EricLLM
A fast batching API to serve LLM models
☆180Updated 10 months ago
LostRuins / datasetexplorer
Easily view and modify JSON datasets for large language models
☆71Updated this week
mounta11n / plusplus-camall
After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…
☆56Updated 6 months ago
stringandstickytape / MaxsAiStudio
A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.
☆29Updated this week
Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆58Updated 6 months ago
Aesthisia / LLMinator
Gradio based tool to run opensource LLM models directly from Huggingface
☆91Updated 8 months ago
janhq / ichigo-demo
☆91Updated last month
janhq / cortex.llamacpp
cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…
☆36Updated this week
monk1337 / auto-ollama
run ollama & gguf easily with a single command
☆49Updated 9 months ago
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆106Updated 10 months ago
leafspark / AutoGGUF
automatically quant GGUF models
☆157Updated this week
abgulati / kosmos-2_5-containerized
Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-scrip…
☆59Updated 7 months ago
abgulati / hf-waitress
Serving LLMs in the HF-Transformers format via a PyFlask API
☆69Updated 5 months ago
jason-brian-anderson / long_term_memory_with_qdrant
RAG implementation for Ooba characters. dynamically spins up new qdrant vector DB and manages retrieval and commits for conversations ba…
☆47Updated last year
malaysia-ai / transformers-openai-api
Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.
☆20Updated 2 weeks ago
unslothai / unsloth-studio
Unsloth Studio
☆65Updated 4 months ago
nhaouari / local11labs
Local11Labs allows generating high-quality text-to-speech and podcast content using the fast and tiny Kokoro-82M.
☆45Updated last month
beratcmn / local-intelligence
Something similar to Apple Intelligence?
☆59Updated 8 months ago
AndrewVeee / assistant-demo
Demo of an "always-on" AI assistant.
☆24Updated last year
LAION-AI / Desktop_BUD-E
BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…
☆31Updated 7 months ago
Fus3n / TwoAI
A simple experiment on letting two local LLM have a conversation about anything!
☆104Updated 8 months ago
elgatopanzon / gatogpt
Local LLM inference & management server with built-in OpenAI API
☆31Updated 10 months ago