[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models
☆67Sep 22, 2024Updated last year
Alternatives and similar repositories for MobileQuant
Users that are interested in MobileQuant are comparing it to the libraries listed below
Sorting:
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆125Feb 22, 2026Updated last week
- This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…☆10Oct 4, 2021Updated 4 years ago
- High-speed and easy-use LLM serving framework for local deployment☆148Aug 7, 2025Updated 6 months ago
- Large Language Model Onnx Inference Framework☆35Nov 25, 2025Updated 3 months ago
- ☆13Jul 14, 2025Updated 7 months ago
- What Would Portland Do? Generative agent experience☆13Mar 13, 2024Updated last year
- snpe tutorial☆10Dec 25, 2023Updated 2 years ago
- An open-sourced PyTorch library for developing energy efficient multiplication-less models and applications.☆14Feb 3, 2025Updated last year
- ☆73Dec 16, 2025Updated 2 months ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆90Feb 14, 2026Updated 2 weeks ago
- [ICML 2022] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks☆15May 18, 2022Updated 3 years ago
- d-Matrix DMX Compressor: A Pytorch toolkit for nn.Module transformations supporting advanced quantization, sparsity, and elementwise func…☆21Oct 22, 2025Updated 4 months ago
- ☆14Feb 3, 2022Updated 4 years ago
- Model Quantization Benchmark☆18Sep 30, 2025Updated 5 months ago
- ☆15Sep 24, 2023Updated 2 years ago
- YOLOv5在高通AI Engine Direct环境下进行QNN量化,CPU推理的项目☆16Sep 10, 2024Updated last year
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆16Oct 11, 2024Updated last year
- Fast Multimodal LLM on Mobile Devices☆1,412Updated this week
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- Awesome-SLM: a curated list of Small Language Model☆27Jun 24, 2024Updated last year
- Tengine 管子是用来快速生产 demo 的辅助工具☆12Jul 15, 2021Updated 4 years ago
- The repository supports TensorRT, QNN platform inference, 2D obstacle detection yolo series (yolov5, yolov8, yolo11, yolox), semantic seg…☆20May 6, 2025Updated 9 months ago
- ☆19Apr 3, 2025Updated 10 months ago
- Run Chinese MobileBert model on SNPE.☆15May 19, 2023Updated 2 years ago
- ☆17Dec 7, 2023Updated 2 years ago
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆27Dec 14, 2025Updated 2 months ago
- 基于 CUDA Driver API 的 cuda 运行时环境☆15Jul 30, 2025Updated 7 months ago
- Testing paligemma2 finetuning on reasoning dataset☆18Dec 28, 2024Updated last year
- WeGeFT: Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models☆22Jul 10, 2025Updated 7 months ago
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆48Sep 30, 2025Updated 5 months ago
- [MobiSys 2020] Fast and Scalable In-memory Deep Multitask Learning via Neural Weight Virtualization☆15Jun 9, 2020Updated 5 years ago
- Efficient 3bit/4bit quantization of LLaMA models☆18May 18, 2023Updated 2 years ago
- ☆22Oct 22, 2024Updated last year
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆21Jan 31, 2026Updated last month
- FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters☆16May 26, 2021Updated 4 years ago
- Official implementation of ECCV24 paper: POA☆24Aug 8, 2024Updated last year
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 8 months ago
- Implementation of OpenAI paper with Simple Noise Scale on Fastai V2☆19Apr 16, 2021Updated 4 years ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆22Jun 26, 2024Updated last year