deepsense-ai / edge-slm
This project is a native implementation of a RAG pipeline for Small Language Models tested on Android devices. The main goal was to fit the whole RAG pipeline into a resource constrained device - ie. smartphone. By design the provided RAG library should be deployable on various platforms.
☆77Updated 11 months ago
Alternatives and similar repositories for edge-slm:
Users that are interested in edge-slm are comparing it to the libraries listed below
- Efficient, consistent and secure library for querying structured data with natural language☆148Updated 5 months ago
- ☆200Updated 9 months ago
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆120Updated last year
- Building blocks for rapid development of GenAI applications☆54Updated this week
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆157Updated 5 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆200Updated 7 months ago
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆265Updated last week
- On-device streaming text-to-speech engine powered by deep learning☆72Updated this week
- ☆11Updated 9 months ago
- ☆99Updated 6 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆85Updated 2 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆43Updated 5 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆61Updated last month
- Whisper realtime streaming for long speech-to-text transcription and translation☆112Updated last year
- ☆113Updated 5 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 7 months ago
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆74Updated 5 months ago
- ONNX implementation of Whisper. PyTorch free.☆92Updated 3 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalization☆272Updated 8 months ago
- One click templates for inferencing Language Models☆162Updated this week
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 4 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆41Updated 8 months ago
- ☆53Updated last month
- Google TPU optimizations for transformers models☆103Updated last month
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆104Updated 3 months ago
- From scratch implementation of a vision language model in pure PyTorch☆200Updated 10 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week