NVIDIA / ChatRTX
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
☆2,951Updated 3 weeks ago
Alternatives and similar repositories for ChatRTX:
Users that are interested in ChatRTX are comparing it to the libraries listed below
- TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizati…☆10,294Updated this week
- ☆954Updated 2 months ago
- Yes, it's another chat over documents implementation... but this one is entirely local!☆1,753Updated last month
- Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…☆1,249Updated last week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,131Updated this week
- Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.☆3,015Updated last week
- Chatbot Ollama is an open source chat UI for Ollama.☆1,700Updated last month
- High-speed Large Language Model Serving for Local Deployment☆8,184Updated 2 months ago
- Run Mixtral-8x7B models in Colab or consumer desktops☆2,308Updated last year
- ☆1,417Updated last month
- Local AI API Platform☆2,622Updated last week
- [ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct☆2,016Updated 5 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,863Updated last year
- Stable Diffusion and Flux in pure C/C++☆4,023Updated last month
- Run GGUF models easily with a KoboldAI UI. One File. Zero Install.☆7,119Updated this week
- Reaching LLaMA2 Performance with 0.1M Dollars☆981Updated 9 months ago
- A collection of standardized JSON descriptors for Large Language Model (LLM) files.☆795Updated 8 months ago
- Gemma open-weight LLM library, from Google DeepMind☆3,201Updated last week
- lightweight, standalone C++ inference engine for Google's Gemma models.☆6,352Updated last week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,104Updated 2 weeks ago
- Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend…☆1,959Updated last year
- Large Language Model Text Generation Inference☆10,052Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,930Updated 2 weeks ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆4,818Updated 2 weeks ago
- Modeling, training, eval, and inference code for OLMo☆5,519Updated this week
- Large-scale LLM inference engine☆1,395Updated this week
- Training LLMs with QLoRA + FSDP☆1,472Updated 5 months ago
- PyTorch native post-training library☆5,103Updated this week
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,432Updated 11 months ago
- AirLLM 70B inference with single 4GB GPU☆5,758Updated 5 months ago