janhq / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
☆42Updated 4 months ago
Alternatives and similar repositories for cortex.tensorrt-llm:
Users that are interested in cortex.tensorrt-llm are comparing it to the libraries listed below
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆22Updated 7 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆20Updated this week
- Gradio based tool to run opensource LLM models directly from Huggingface☆91Updated 7 months ago
- ☆24Updated 3 weeks ago
- cli tool to quantize gguf, gptq, awq, hqq and exl2 models☆69Updated 2 months ago
- After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…☆56Updated 6 months ago
- ☆53Updated 8 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆29Updated this week
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆31Updated 7 months ago
- ☆102Updated 3 months ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆110Updated 7 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆48Updated last month
- ☆28Updated 4 months ago
- automatically quant GGUF models☆154Updated this week
- Testing LLM reasoning abilities with family relationship quizzes.☆57Updated 3 weeks ago
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆36Updated this week
- ☆65Updated 8 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆105Updated 9 months ago
- Realtime tts reading of large textfiles by your favourite voice. +Translation via LLM (Python script)☆53Updated 4 months ago
- A repository to store helpful information and emerging insights in regard to LLMs☆20Updated last year
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆12Updated 2 months ago
- ☆91Updated last month
- run ollama & gguf easily with a single command☆49Updated 9 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆58Updated 6 months ago
- A combination of Oobabooga's fork and the main cuda branch of GPTQ-for-LLaMa in a package format.☆22Updated last year
- ☆14Updated 5 months ago
- idea: https://github.com/nyxkrage/ebook-groupchat/☆85Updated 6 months ago
- ☆43Updated 3 months ago
- Loader extension for tabbyAPI in SillyTavern☆25Updated 6 months ago