NVIDIA / ChatRTXLinks
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
☆3,002Updated 2 months ago
Alternatives and similar repositories for ChatRTX
Users that are interested in ChatRTX are comparing it to the libraries listed below
Sorting:
- Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.☆3,216Updated 3 weeks ago
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizati…☆10,865Updated this week
- Gemma open-weight LLM library, from Google DeepMind☆3,434Updated this week
- Home of StarCoder2!☆1,922Updated last year
- ☆979Updated 4 months ago
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,167Updated 8 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,218Updated 3 weeks ago
- This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi…☆3,385Updated this week
- OpenChat: Advancing Open-source Language Models with Imperfect Data☆5,372Updated 9 months ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,000Updated 2 months ago
- Large Language Model Text Generation Inference☆10,265Updated this week
- Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.☆504Updated 2 months ago
- Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…☆1,265Updated last month
- Modeling, training, eval, and inference code for OLMo☆5,719Updated last week
- SGLang is a fast serving framework for large language models and vision language models.☆15,567Updated this week
- Tools for merging pretrained large language models.☆5,853Updated last week
- Interact, analyze and structure massive text, image, embedding, audio and video datasets☆1,737Updated 2 weeks ago
- Build and run containers leveraging NVIDIA GPUs☆3,365Updated this week
- On-device AI across mobile, embedded and edge for PyTorch☆2,980Updated this week
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,407Updated 6 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,200Updated last month
- Composable building blocks to build Llama Apps☆7,864Updated this week
- PyTorch native post-training library☆5,296Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆6,591Updated this week
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,059Updated last week
- ☆2,973Updated 9 months ago
- Simple, safe way to store and distribute tensors☆3,334Updated last week
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model☆4,919Updated 9 months ago
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,423Updated 3 weeks ago
- LM Studio CLI☆3,250Updated this week