autonomi-ai / nosLinks
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
☆144Updated last year
Alternatives and similar repositories for nos
Users that are interested in nos are comparing it to the libraries listed below
Sorting:
- Vector Database with support for late interaction and token level embeddings.☆55Updated 3 weeks ago
- ☆199Updated last year
- ☆89Updated 9 months ago
- ☆38Updated last year
- A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.☆151Updated last month
- LLaVA server (llama.cpp).☆180Updated last year
- Start a server from the MLX library.☆188Updated 11 months ago
- run paligemma in real time☆131Updated last year
- GPU prices aggregator for cloud providers☆39Updated 2 weeks ago
- Foyle is a copilot to help developers deploy and operate their applications.☆131Updated 4 months ago
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models access…☆114Updated last year
- Chat Markup Language conversation library☆55Updated last year
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 11 months ago
- GRDN.AI app for garden optimization☆70Updated last year
- Efficient vector database for hundred millions of embeddings.☆206Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆59Updated last week
- Python client library for improving your LLM app accuracy☆98Updated 5 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆145Updated 2 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆150Updated 5 months ago
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆49Updated this week
- Cerule - A Tiny Mighty Vision Model☆66Updated 10 months ago
- Replace expensive LLM calls with finetunes automatically☆65Updated last year
- Maybe the new state of the art vision model? we'll see 🤷♂️☆165Updated last year
- Run GGML models with Kubernetes.☆173Updated last year
- ☆39Updated last year
- Fast parallel LLM inference for MLX☆198Updated last year
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- Fine-tuning and serving LLMs on any cloud☆90Updated last year
- ☆66Updated last year
- Embed anything.☆28Updated last year