autonomi-ai / nosLinks
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
☆144Updated last year
Alternatives and similar repositories for nos
Users that are interested in nos are comparing it to the libraries listed below
Sorting:
- Chat Markup Language conversation library☆55Updated last year
- ☆199Updated last year
- Vector Database with support for late interaction and token level embeddings.☆55Updated last week
- run paligemma in real time☆131Updated last year
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆137Updated last month
- GPU prices aggregator for cloud providers☆39Updated this week
- Synthetic Data for LLM Fine-Tuning☆119Updated last year
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated 11 months ago
- Python client library for improving your LLM app accuracy☆98Updated 4 months ago
- An implementation of bucketMul LLM inference☆218Updated 11 months ago
- ☆66Updated last year
- ☆38Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆59Updated 2 months ago
- Efficient vector database for hundred millions of embeddings.☆206Updated last year
- Fine-tuning and serving LLMs on any cloud☆90Updated last year
- Enforce structured output from LLMs 100% of the time☆249Updated 11 months ago
- ☆182Updated 2 months ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆221Updated last year
- Aana SDK is a powerful framework for building AI enabled multimodal applications.☆47Updated last week
- A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.☆151Updated last week
- Foyle is a copilot to help developers deploy and operate their applications.☆130Updated 3 months ago
- Use context-free grammars with an LLM☆170Updated last year
- GPU accelerated client-side embeddings for vector search, RAG etc.☆66Updated last year
- Fast parallel LLM inference for MLX☆193Updated 11 months ago
- A simple DAG for executing LLM calls and using tools.☆41Updated last year
- A framework for optimizing DSPy programs with RL☆76Updated this week
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- GRDN.AI app for garden optimization☆70Updated last year
- LLM family chart☆51Updated last year
- Run GGML models with Kubernetes.☆173Updated last year