deepsense-ai / edge-slmLinks
This project is a native implementation of a RAG pipeline for Small Language Models tested on Android devices. The main goal was to fit the whole RAG pipeline into a resource constrained device - ie. smartphone. By design the provided RAG library should be deployable on various platforms.
☆90Updated last year
Alternatives and similar repositories for edge-slm
Users that are interested in edge-slm are comparing it to the libraries listed below
Sorting:
- Utils for Unsloth☆114Updated last week
- Self-host LLMs with vLLM and BentoML☆134Updated 2 weeks ago
- Kyutai with an "eye"☆207Updated 3 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆64Updated 9 months ago
- FRP Fork☆171Updated 3 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆100Updated 6 months ago
- Whisper realtime streaming for long speech-to-text transcription and translation☆120Updated last year
- An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.☆59Updated this week
- Own your AI, search the web with it🌐😎☆86Updated 6 months ago
- ☆205Updated last year
- a simplified version of Google's Gemma model to be used for learning☆26Updated last year
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆405Updated this week
- Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector…☆295Updated 9 months ago
- NVIDIA Riva runnable tutorials☆135Updated last month
- ☆54Updated 5 months ago
- A streaming whisper server for on-prem transcription☆20Updated 11 months ago
- Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)☆86Updated last year
- Inference, Fine Tuning and many more recipes with Gemma family of models☆242Updated 2 weeks ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆42Updated last year
- A collection of all available inference solutions for the LLMs☆91Updated 4 months ago
- On-device streaming text-to-speech engine powered by deep learning☆98Updated this week
- A Demo of Cache-Augmented Generation (CAG) in an LLM☆103Updated last month
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆122Updated last year
- A WebRTC server that allows you to interact with an LLM using your speech and responds back with generated audio.☆134Updated last year
- Easy to use, High Performant Knowledge Distillation for LLMs☆88Updated 2 months ago
- LLaMA 3 is one of the most promising open-source model after Mistral, we will recreate it's architecture in a simpler manner.☆171Updated 10 months ago
- Multimodal AI agent with Llama 3.2: A Streamlit app that processes text, images, PDFs, and PPTs, integrating NIM microservices, Milvus, a…☆121Updated 9 months ago
- Evaluation of bm42 sparse indexing algorithm☆68Updated last year
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆26Updated 4 months ago
- ☆57Updated 8 months ago