lx200916 / ChatBotAppLinks
☆41Updated 8 months ago
Alternatives and similar repositories for ChatBotApp
Users that are interested in ChatBotApp are comparing it to the libraries listed below
Sorting:
- Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"☆57Updated 10 months ago
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆35Updated 5 months ago
- High-speed and easy-use LLM serving framework for local deployment☆139Updated 4 months ago
- ☆101Updated 3 weeks ago
- Fast Multimodal LLM on Mobile Devices☆1,277Updated last week
- LLM inference in C/C++☆48Updated this week
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆88Updated 2 weeks ago
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆96Updated this week
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆106Updated last week
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Updated last year
- Awesome Mobile LLMs☆282Updated 3 weeks ago
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆44Updated 2 years ago
- This is a list of awesome edgeAI inference related papers.☆97Updated 2 years ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Updated 3 years ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆113Updated 5 months ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆192Updated 2 years ago
- ☆171Updated last week
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆350Updated this week
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆68Updated 7 months ago
- 分层解耦的深度学习推理引擎☆78Updated 10 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆137Updated 7 months ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆62Updated last year
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆84Updated this week
- ☆135Updated last week
- High performance Transformer implementation in C++.☆146Updated 11 months ago
- ☆39Updated this week
- LLM inference in C/C++☆20Updated last month
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆94Updated 6 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆119Updated 7 months ago