lx200916 / ChatBotAppLinks
☆36Updated 3 months ago
Alternatives and similar repositories for ChatBotApp
Users that are interested in ChatBotApp are comparing it to the libraries listed below
Sorting:
- ☆25Updated this week
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆75Updated this week
- Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"☆55Updated 5 months ago
- LLM inference in C/C++☆42Updated this week
- Fast Multimodal LLM on Mobile Devices☆948Updated last month
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆56Updated this week
- LLM inference in C/C++☆18Updated last week
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆186Updated last year
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆232Updated 2 weeks ago
- 分层解耦的深度学习推理引擎☆73Updated 5 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆98Updated this week
- ☆161Updated last week
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆48Updated 2 years ago
- Low-bit LLM inference on CPU/NPU with lookup table☆823Updated last month
- ☆61Updated last month
- High-speed and easy-use LLM serving framework for local deployment☆112Updated 3 months ago
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆55Updated this week
- 机器学习编译 陈天奇☆37Updated 2 years ago
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆34Updated last year
- Demonstration of running a native LLM on Android device.☆151Updated this week
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend☆59Updated this week
- Triton Documentation in Chinese Simplified / Triton 中文文档☆74Updated 3 months ago
- Run generative AI models in sophgo BM1684X/BM1688☆225Updated this week
- ⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.🎉☆192Updated 2 months ago
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.☆23Updated 9 months ago
- Summary of some awesome work for optimizing LLM inference☆84Updated last month
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆93Updated this week
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆65Updated 9 months ago
- llm deploy project based onnx.☆42Updated 9 months ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆58Updated 8 months ago