deepsense-ai / edge-slm
This project is a native implementation of a RAG pipeline for Small Language Models tested on Android devices. The main goal was to fit the whole RAG pipeline into a resource constrained device - ie. smartphone. By design the provided RAG library should be deployable on various platforms.
☆82Updated last year
Alternatives and similar repositories for edge-slm:
Users that are interested in edge-slm are comparing it to the libraries listed below
- Efficient, consistent and secure library for querying structured data with natural language☆153Updated last week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆277Updated this week
- Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector…☆255Updated 6 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆168Updated 3 weeks ago
- ONNX and TensorRT implementation of Whisper☆61Updated last year
- ☆246Updated last week
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆56Updated 7 months ago
- ☆204Updated 11 months ago
- Template for professional data science and python applications made by deepsense.ai☆31Updated last week
- Comparison of Language Model Inference Engines☆214Updated 4 months ago
- Moxin is a family of fully open-source and reproducible LLMs☆87Updated this week
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆58Updated 10 months ago
- LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT☆27Updated last year
- Self-host LLMs with vLLM and BentoML☆106Updated last week
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆275Updated last month
- ☆69Updated last year
- Awesome Mobile LLMs☆169Updated last month
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆81Updated 6 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆46Updated 9 months ago
- Reference implementation of Mistral AI 7B v0.1 model.☆28Updated last year
- ☆53Updated 10 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆295Updated this week
- ☆100Updated 7 months ago
- Training and Fine-tuning an llm in Python and PyTorch.☆41Updated last year
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆120Updated last year
- Pybind11 bindings for Whisper.cpp☆55Updated 3 weeks ago
- ONNX implementation of Whisper. PyTorch free.☆94Updated 5 months ago
- Triton backend for https://github.com/OpenNMT/CTranslate2☆35Updated last year
- LoRA and DoRA from Scratch Implementations☆202Updated last year
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 9 months ago