autonomi-ai / nosLinks
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
☆143Updated 11 months ago
Alternatives and similar repositories for nos
Users that are interested in nos are comparing it to the libraries listed below
Sorting:
- ☆198Updated last year
- Maybe the new state of the art vision model? we'll see 🤷♂️☆163Updated last year
- run paligemma in real time☆131Updated last year
- Vector Database with support for late interaction and token level embeddings.☆54Updated 8 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 10 months ago
- Action library for AI Agent☆214Updated 2 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- ☆89Updated 8 months ago
- Run GGML models with Kubernetes.☆172Updated last year
- An implementation of bucketMul LLM inference☆217Updated 11 months ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆222Updated last year
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models access…☆114Updated last year
- ☆39Updated last year
- Fast parallel LLM inference for MLX☆189Updated 10 months ago
- LLM family chart☆51Updated last year
- Chat Markup Language conversation library☆55Updated last year
- A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.☆143Updated last week
- GRDN.AI app for garden optimization☆70Updated last year
- A fast batching API to serve LLM models☆181Updated last year
- Fine-tuning and serving LLMs on any cloud☆90Updated last year
- AI-to-AI Testing | Simulation framework for LLM-based applications☆137Updated last year
- ☆137Updated last year
- Embed anything.☆28Updated last year
- Pixeltable — AI Data infrastructure providing a declarative, incremental approach for multimodal workloads.☆241Updated this week
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆58Updated last month
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆136Updated 2 weeks ago
- Python client library for improving your LLM app accuracy☆98Updated 3 months ago
- ☆114Updated 5 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆52Updated last year
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆130Updated last month