autonomi-ai / nos
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
☆138Updated 9 months ago
Alternatives and similar repositories for nos:
Users that are interested in nos are comparing it to the libraries listed below
- run paligemma in real time☆131Updated 10 months ago
- ☆199Updated last year
- ☆89Updated 5 months ago
- Vector Database with support for late interaction and token level embeddings.☆53Updated 5 months ago
- ☆38Updated last year
- A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.☆124Updated last week
- A simple DAG for executing LLM calls and using tools.☆41Updated last year
- Chat Markup Language conversation library☆55Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆43Updated last week
- Start a server from the MLX library.☆182Updated 7 months ago
- Fine-tuning and serving LLMs on any cloud☆89Updated last year
- GRDN.AI app for garden optimization☆70Updated last year
- A library for building software agents using behavior trees and language models.☆80Updated last month
- Routing on Random Forest (RoRF)☆135Updated 6 months ago
- Python client library for improving your LLM app accuracy☆97Updated last month
- Pixeltable — AI Data infrastructure providing a declarative, incremental approach for multimodal workloads.☆164Updated this week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 7 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆125Updated 3 months ago
- Enforce structured output from LLMs 100% of the time☆248Updated 8 months ago
- Replace expensive LLM calls with finetunes automatically☆65Updated last year
- An implementation of bucketMul LLM inference☆215Updated 8 months ago
- A fast batching API to serve LLM models☆182Updated 10 months ago
- LLaVA server (llama.cpp).☆178Updated last year
- Action library for AI Agent☆211Updated this week
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- ☆39Updated last year
- TypeScript generator for llama.cpp Grammar directly from TypeScript interfaces☆135Updated 8 months ago
- ☆150Updated 3 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated 5 months ago
- a lightweight, open-source blueprint for building powerful and scalable LLM chat applications☆30Updated 9 months ago