NVIDIA / ChatRTXLinks
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
☆3,085Updated 8 months ago
Alternatives and similar repositories for ChatRTX
Users that are interested in ChatRTX are comparing it to the libraries listed below
Sorting:
- Build and run containers leveraging NVIDIA GPUs☆3,901Updated this week
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,167Updated last year
- Local AI API Platform☆2,763Updated 5 months ago
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆12,312Updated this week
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,154Updated 2 weeks ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,378Updated 3 months ago
- ☆1,011Updated 10 months ago
- PyTorch native post-training library☆5,608Updated this week
- Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.☆3,637Updated last week
- Yes, it's another chat over documents implementation... but this one is entirely local!☆1,805Updated 8 months ago
- Go ahead and axolotl questions☆10,870Updated last week
- High-speed Large Language Model Serving for Local Deployment☆8,420Updated 4 months ago
- ☆1,837Updated last week
- On-device AI across mobile, embedded and edge for PyTorch☆3,634Updated this week
- ☆1,552Updated last year
- Run Mixtral-8x7B models in Colab or consumer desktops☆2,324Updated last year
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,381Updated last year
- Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.☆2,196Updated this week
- Python bindings for llama.cpp☆9,800Updated 3 months ago
- A collection of standardized JSON descriptors for Large Language Model (LLM) files.☆801Updated last year
- Training LLMs with QLoRA + FSDP☆1,534Updated last year
- ☆3,039Updated 2 weeks ago
- Large-scale LLM inference engine☆1,600Updated last week
- Foundational model for human-like, expressive TTS☆4,197Updated last year
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,878Updated last year
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,552Updated 6 months ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆5,001Updated 7 months ago
- ☆1,555Updated last month
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence☆1,241Updated 5 months ago
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,755Updated last month