janhq / cortex.tensorrt-llm
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
☆40Updated last month
Related projects ⓘ
Alternatives and complementary repositories for cortex.tensorrt-llm
- automatically quant GGUF models☆140Updated this week
- Realtime tts reading of large textfiles by your favourite voice. +Translation via LLM (Python script)☆47Updated last month
- A fast batching API to serve LLM models☆172Updated 6 months ago
- A pipeline parallel training script for LLMs.☆83Updated this week
- Simple examples using Argilla tools to build AI☆40Updated this week
- HTTP proxy for on-demand model loading with llama.cpp (or other OpenAI compatible backends)☆41Updated this week
- ☆65Updated 2 months ago
- Tcurtsni: Reverse Instruction Chat, ever wonder what your LLM wants to ask you?☆21Updated 4 months ago
- LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT☆26Updated 9 months ago
- Easily view and modify JSON datasets for large language models☆62Updated last month
- After my server ui improvements were successfully merged, consider this repo a playground for experimenting, tinkering and hacking around…☆56Updated 3 months ago
- Gradio based tool to run opensource LLM models directly from Huggingface☆87Updated 4 months ago
- ☆149Updated 4 months ago
- Serving LLMs in the HF-Transformers format via a PyFlask API☆68Updated 2 months ago
- A simple experiment on letting two local LLM have a conversation about anything!☆92Updated 4 months ago
- ☆104Updated 8 months ago
- All the world is a play, we are but actors in it.☆47Updated 4 months ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆106Updated 4 months ago
- ☆25Updated last month
- An extension that lets the AI take the wheel, allowing it to use the mouse and keyboard, recognize UI elements, and prompt itself :3...no…☆96Updated 3 weeks ago
- An unsupervised model merging algorithm for Transformers-based language models.☆100Updated 6 months ago
- run ollama & gguf easily with a single command☆48Updated 6 months ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆55Updated last week
- ☆18Updated 3 weeks ago
- One click templates for inferencing Language Models☆120Updated this week
- ☆53Updated 5 months ago
- ☆64Updated 5 months ago
- AutoNL - Natural Language Automation tool☆83Updated 8 months ago
- Distributed Inference for mlx LLm☆70Updated 3 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆55Updated 3 months ago