deepsense-ai / edge-slmLinks
This project is a native implementation of a RAG pipeline for Small Language Models tested on Android devices. The main goal was to fit the whole RAG pipeline into a resource constrained device - ie. smartphone. By design the provided RAG library should be deployable on various platforms.
☆88Updated last year
Alternatives and similar repositories for edge-slm
Users that are interested in edge-slm are comparing it to the libraries listed below
Sorting:
- Efficient, consistent and secure library for querying structured data with natural language☆159Updated last month
- ☆204Updated last year
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆172Updated 8 months ago
- ☆121Updated 2 months ago
- ASR + diarization model server with speculative decoding☆60Updated last year
- Whisper realtime streaming for long speech-to-text transcription and translation☆117Updated last year
- Simple package to extract text with coordinates from programmatic PDFs☆128Updated this week
- This code sets up a simple yet robust server using FastAPI for handling asynchronous requests for embedding generation and reranking task…☆69Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆263Updated 7 months ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆208Updated this week
- RAFT, or Retrieval-Augmented Fine-Tuning, is a method comprising of a fine-tuning and a RAG-based retrieval phase. It is particularly sui…☆119Updated 9 months ago
- Building blocks for rapid development of GenAI applications☆70Updated this week
- A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.☆288Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated 9 months ago
- Comparison of Language Model Inference Engines☆217Updated 5 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆95Updated 5 months ago
- Train your own small bitnet model☆71Updated 7 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆86Updated 2 weeks ago
- ☆101Updated 9 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆69Updated 2 weeks ago
- Utils for Unsloth☆92Updated this week
- Sentence Transformers API: An OpenAI compatible embedding API server☆59Updated 9 months ago
- ☆57Updated 3 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 7 months ago
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆67Updated last year
- ONNX implementation of Whisper. PyTorch free.☆97Updated 6 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆317Updated this week
- This reference can be used with any existing OpenAI integrated apps to run with TRT-LLM inference locally on GeForce GPU on Windows inste…☆121Updated last year
- ☆54Updated 4 months ago
- LLM inference in C/C++☆21Updated 2 months ago