quic / ai-hub-models
The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
☆603Updated last week
Alternatives and similar repositories for ai-hub-models:
Users that are interested in ai-hub-models are comparing it to the libraries listed below
- ☆124Updated 2 months ago
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆140Updated 3 weeks ago
- Supporting PyTorch models with the Google AI Edge TFLite runtime.☆457Updated this week
- LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-…☆276Updated this week
- Generative AI extensions for onnxruntime☆620Updated this week
- On-device AI across mobile, embedded and edge for PyTorch☆2,526Updated this week
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆220Updated this week
- onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime☆360Updated this week
- Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massiv…☆750Updated this week
- TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillati…☆715Updated this week
- Examples for using ONNX Runtime for machine learning inferencing.☆1,300Updated 3 weeks ago
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆443Updated this week
- ☆312Updated last year
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆54Updated 3 months ago
- Neural Network Compression Framework for enhanced OpenVINO™ inference☆976Updated this week
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,250Updated last week
- Conversion of PyTorch Models into TFLite☆369Updated last year
- Advanced Quantization Algorithm for LLMs/VLMs.☆372Updated this week
- Official implementation of Half-Quadratic Quantization (HQQ)☆748Updated this week
- A pytorch quantization backend for optimum☆883Updated last month
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated 5 months ago
- Common utilities for ONNX converters☆259Updated 2 months ago
- Low-bit LLM inference on CPU with lookup table☆687Updated last month
- Efficient Inference of Transformer models☆422Updated 6 months ago
- Universal cross-platform tokenizers binding to HF and sentencepiece☆305Updated 2 weeks ago
- This repository contains tutorials and examples for Triton Inference Server☆648Updated this week
- Fast Multimodal LLM on Mobile Devices☆696Updated last week
- A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.☆1,423Updated 2 weeks ago
- Actively maintained ONNX Optimizer☆672Updated 3 weeks ago
- 本项目是一个通过文字生成图片的项目,基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型,包括其配套的模型运行框架。☆141Updated 10 months ago