shubham0204 / llama.cpp-simple-chat-interfaceLinks
Build a simple CMD chat interface with llama.cpp and C++
☆14Updated 4 months ago
Alternatives and similar repositories for llama.cpp-simple-chat-interface
Users that are interested in llama.cpp-simple-chat-interface are comparing it to the libraries listed below
Sorting:
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆306Updated last year
- Universal cross-platform tokenizers binding to HF and sentencepiece☆451Updated 2 weeks ago
- Streaming TTS based on Piper with optional RK3588 NPU support☆121Updated 9 months ago
- Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector…☆350Updated last year
- Using Unified Memory on Jetson☆31Updated 3 years ago
- snpe tutorial☆10Updated 2 years ago
- ☆70Updated 2 years ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆785Updated last week
- llm deploy project based onnx.☆49Updated last year
- ☆10Updated last year
- Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆79Updated 8 months ago
- A Toolkit to Help Optimize Onnx Model☆409Updated this week
- Cross-Platform Production-ready C++ inference engine for YOLO models (v5-v12, YOLO26). Unified API for detection, segmentation, pose esti…☆818Updated last week
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆550Updated 4 months ago
- Run Chinese MobileBert model on SNPE.☆15Updated 2 years ago
- EasyNN是一个面向教学而开发的神经网络推理框架,旨在让大家0基础也能自主完成推理框架编写!☆37Updated last year
- transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)☆18Updated 3 years ago
- Implementation of yolo v10 in c++ std 17 over opencv and onnxruntime☆90Updated last year
- YOLOv10 C++ implementation using OpenVINO for efficient and accurate real-time object detection.☆77Updated 10 months ago
- High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI☆227Updated last month
- ggml学习笔记,ggml是一个机器学习的推理框架☆18Updated last year
- This is a repository to practice multi-thread programming in C++☆27Updated last year
- An open source light-weight and high performance inference framework for Hailo devices☆162Updated this week
- Efficient Inference of Transformer models☆478Updated last year
- C++ TensorRT Implementation of NanoSAM☆50Updated 2 years ago
- Triton Migration Guide for DeepStreamSDK.☆15Updated 2 years ago
- A reference application for a local AI assistant with LLM and RAG☆118Updated last year
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆111Updated this week
- TensorRT C++ API Tutorial☆791Updated last year
- Let's use Qualcomm NPU in Android☆18Updated 11 months ago