NVIDIA / ChatRTXLinks

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM

☆3,022

Alternatives and similar repositories for ChatRTX

Users that are interested in ChatRTX are comparing it to the libraries listed below

Sorting:

dvmazur / mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
☆2,316Updated last year
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,168Updated 9 months ago
turboderp-org / exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆4,245Updated 2 weeks ago
NVIDIA / TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizati…
☆11,125Updated this week
microsoft / LLMLingua
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…
☆5,300Updated 4 months ago
huggingface / optimum-nvidia
☆988Updated 5 months ago
SJTU-IPADS / PowerInfer
High-speed Large Language Model Serving for Local Deployment
☆8,240Updated 5 months ago
cohere-ai / cohere-toolkit
Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.
☆3,068Updated last week
NVIDIA / GenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
☆3,295Updated last week
McGill-NLP / webllama
Llama-3 agents that can browse the web by following instructions and talking to you
☆1,411Updated 7 months ago
bigcode-project / starcoder2
Home of StarCoder2!
☆1,942Updated last year
aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,482Updated last week
b4rtaz / distributed-llama
Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,231Updated this week
tairov / llama2.mojo
Inference Llama 2 in one file of pure 🔥
☆2,115Updated last year
microsoft / vscode-ai-toolkit
☆1,511Updated this week
lmstudio-ai / model-catalog
A collection of standardized JSON descriptors for Large Language Model (LLM) files.
☆798Updated 11 months ago
mistralai / mistral-inference
Official inference library for Mistral models
☆10,377Updated 4 months ago
pytorch / torchtune
PyTorch native post-training library
☆5,361Updated last week
imoneoi / openchat
OpenChat: Advancing Open-source Language Models with Imperfect Data
☆5,386Updated 10 months ago
nilsherzig / LLocalSearch
LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a ch…
☆5,941Updated 3 months ago
AnswerDotAI / fsdp_qlora
Training LLMs with QLoRA + FSDP
☆1,524Updated 8 months ago
menloresearch / cortex.cpp
Local AI API Platform
☆2,765Updated 3 weeks ago
ise-uiuc / magicoder
[ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct
☆2,019Updated 8 months ago
Vahe1994 / AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…
☆1,277Updated 2 months ago
axolotl-ai-cloud / axolotl
Go ahead and axolotl questions
☆10,038Updated this week
ibm-granite / granite-code-models
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
☆1,220Updated last month
huggingface / text-generation-inference
Large Language Model Text Generation Inference
☆10,367Updated last week
microsoft / PhiCookBook
This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi…
☆3,456Updated last week
mit-han-lab / streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆6,943Updated last year
XiongjieDai / GPU-Benchmarks-on-LLM-Inference
Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
☆1,708Updated last year