NexaAI / Ocotpus-v2-demoLinks

Android ChatBot with Octopus v2 - Function Calling Demo

☆15

Alternatives and similar repositories for Ocotpus-v2-demo

Users that are interested in Ocotpus-v2-demo are comparing it to the libraries listed below

Sorting:

NexaAI / Awesome-LLMs-on-device
Awesome LLMs on Device: A Comprehensive Survey
☆1,137Updated 5 months ago
NexaAI / octopus-v4
AI for all: Build the large graph of the language models
☆269Updated last year
UbiquitousLearning / mllm
Fast Multimodal LLM on Mobile Devices
☆935Updated 2 weeks ago
quic / ai-hub-apps
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…
☆221Updated this week
allenai / OLMoE
OLMoE: Open Mixture-of-Experts Language Models
☆792Updated 3 months ago
foldl / chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
☆640Updated this week
mit-han-lab / omniserve
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆711Updated 3 months ago
microsoft / T-MAC
Low-bit LLM inference on CPU/NPU with lookup table
☆816Updated 3 weeks ago
THUDM / Android-Lab
☆219Updated last month
mbzuai-oryx / MobiLlama
[ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices
☆647Updated last month
fxmeng / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need
☆315Updated last week
vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆1,553Updated this week
microsoft / MInference
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…
☆1,059Updated last week
intel / auto-round
Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…
☆524Updated last week
microsoft / TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
☆434Updated 5 months ago
ModelCloud / GPTQModel
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…
☆642Updated last week
hahnyuan / LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆491Updated 9 months ago
OpenGVLab / EfficientQAT
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆274Updated last month
OpenGVLab / OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆823Updated last month
facebookresearch / MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
☆1,305Updated 2 months ago
infinigence / Infini-Megrez-Omni
☆233Updated 4 months ago
xuyuzhuang11 / OneBit
The homepage of OneBit model quantization framework.
☆182Updated 4 months ago
thunlp / InfLLM
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…
☆367Updated last year
SafeAILab / EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
☆1,345Updated this week
DakeQQ / Native-LLM-for-Android
Demonstration of running a native LLM on Android device.
☆147Updated 3 weeks ago
mit-han-lab / duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆468Updated 4 months ago
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,087Updated 3 months ago
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆408Updated last month
ModelTC / llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…
☆503Updated last week
magpie-align / magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆721Updated 3 months ago