menloresearch / cortex.tensorrt-llmLinks
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
☆43Updated 8 months ago
Alternatives and similar repositories for cortex.tensorrt-llm
Users that are interested in cortex.tensorrt-llm are comparing it to the libraries listed below
Sorting:
- LLM inference in C/C++☆21Updated 2 months ago
- ☆90Updated 5 months ago
- Gradio based tool to run opensource LLM models directly from Huggingface☆91Updated 11 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆61Updated 4 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆30Updated 2 months ago
- ☆48Updated 3 months ago
- A fast batching API to serve LLM models☆181Updated last year
- An OpenAI API compatible LLM inference server based on ExLlamaV2.☆25Updated last year
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆22Updated 11 months ago
- Very basic framework for composable parameterized large language model (Q)LoRA / (Q)Dora fine-tuning using mlx, mlx_lm, and OgbujiPT.☆40Updated 3 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆23Updated 2 months ago
- AirLLM 70B inference with single 4GB GPU☆13Updated 9 months ago
- ☆66Updated last year
- Demo of an "always-on" AI assistant.☆24Updated last year
- 1.58-bit LLaMa model