janhq / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
☆42Updated 5 months ago
Alternatives and similar repositories for cortex.tensorrt-llm:
Users that are interested in cortex.tensorrt-llm are comparing it to the libraries listed below
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆21Updated 8 months ago
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.☆26Updated 9 months ago
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆12Updated 2 months ago
- ☆79Updated 2 months ago
- ☆24Updated last month
- Easy to use, High Performant Knowledge Distillation for LLMs☆50Updated last month
- entropix style sampling + GUI☆25Updated 4 months ago
- A fast batching API to serve LLM models☆180Updated 10 months ago
- Easily view and modify JSON datasets for large language models☆71Updated this week
- After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…☆56Updated 6 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆29Updated this week
- GPT-4 Level Conversational QA Trained In a Few Hours☆58Updated 6 months ago
- Gradio based tool to run opensource LLM models directly from Huggingface☆91Updated 8 months ago
- ☆91Updated last month
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆36Updated this week
- run ollama & gguf easily with a single command☆49Updated 9 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆106Updated 10 months ago
- automatically quant GGUF models☆157Updated this week
- Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-scrip…☆59Updated 7 months ago
- Serving LLMs in the HF-Transformers format via a PyFlask API☆69Updated 5 months ago
- RAG implementation for Ooba characters. dynamically spins up new qdrant vector DB and manages retrieval and commits for conversations ba…☆47Updated last year
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆20Updated 2 weeks ago
- Unsloth Studio☆65Updated 4 months ago
- Local11Labs allows generating high-quality text-to-speech and podcast content using the fast and tiny Kokoro-82M.☆45Updated last month
- Something similar to Apple Intelligence?☆59Updated 8 months ago
- Demo of an "always-on" AI assistant.☆24Updated last year
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆31Updated 7 months ago
- A simple experiment on letting two local LLM have a conversation about anything!☆104Updated 8 months ago
- Local LLM inference & management server with built-in OpenAI API☆31Updated 10 months ago