NexaAI / Ocotpus-v2-demoLinks
Android ChatBot with Octopus v2 - Function Calling Demo
☆15Updated 11 months ago
Alternatives and similar repositories for Ocotpus-v2-demo
Users that are interested in Ocotpus-v2-demo are comparing it to the libraries listed below
Sorting:
- Awesome LLMs on Device: A Comprehensive Survey☆1,137Updated 5 months ago
- AI for all: Build the large graph of the language models☆269Updated last year
- Fast Multimodal LLM on Mobile Devices☆935Updated 2 weeks ago
- The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…☆221Updated this week
- OLMoE: Open Mixture-of-Experts Language Models☆792Updated 3 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆640Updated this week
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆711Updated 3 months ago
- Low-bit LLM inference on CPU/NPU with lookup table☆816Updated 3 weeks ago
- ☆219Updated last month
- [ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for edge devices☆647Updated last month
- TransMLA: Multi-Head Latent Attention Is All You Need☆315Updated last week
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,553Updated this week
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,059Updated last week
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆524Updated last week
- For releasing code related to compression methods for transformers, accompanying our publications☆434Updated 5 months ago
- Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…☆642Updated last week
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆491Updated 9 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆274Updated last month
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆823Updated last month
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,305Updated 2 months ago
- ☆233Updated 4 months ago
- The homepage of OneBit model quantization framework.☆182Updated 4 months ago
- The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Mem…☆367Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.☆1,345Updated this week
- Demonstration of running a native LLM on Android device.☆147Updated 3 weeks ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆468Updated 4 months ago
- Muon is Scalable for LLM Training☆1,087Updated 3 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆408Updated last month
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆503Updated last week
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆721Updated 3 months ago