autonomi-ai / nos
âĄī¸ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
â141Updated 10 months ago
Alternatives and similar repositories for nos:
Users that are interested in nos are comparing it to the libraries listed below
- Maybe the new state of the art vision model? we'll see đ¤ˇââī¸â162Updated last year
- â199Updated last year
- đšī¸ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.â136Updated 8 months ago
- Vector Database with support for late interaction and token level embeddings.â54Updated 6 months ago
- Python client library for improving your LLM app accuracyâ98Updated 2 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platformâ87Updated this week
- run paligemma in real timeâ131Updated 11 months ago
- TypeScript generator for llama.cpp Grammar directly from TypeScript interfacesâ135Updated 9 months ago
- Action library for AI Agentâ214Updated 3 weeks ago
- GRDN.AI app for garden optimizationâ70Updated last year
- Full finetuning of large language models without large memory requirementsâ94Updated last year
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAIâ223Updated 11 months ago
- â89Updated 6 months ago
- Foyle is a copilot to help developers deploy and operate their applications.â125Updated last month
- Run GGML models with Kubernetes.â173Updated last year
- A curated list of amazingly awesome Modal applications, demos, and shiny things. Inspired by awesome-php.â131Updated this week
- An implementation of bucketMul LLM inferenceâ216Updated 9 months ago
- â39Updated last year
- Fast parallel LLM inference for MLXâ181Updated 9 months ago
- Fine-tuning and serving LLMs on any cloudâ89Updated last year
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oâĻâ129Updated 4 months ago
- Chat Markup Language conversation libraryâ55Updated last year
- Replace expensive LLM calls with finetunes automaticallyâ65Updated last year
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRAâ123Updated last year
- A simple Python sandbox for helpful LLM data agentsâ248Updated 10 months ago
- Python bindings for ggmlâ140Updated 7 months ago
- â151Updated 4 months ago
- Low-Rank adapter extraction for fine-tuned transformers modelsâ171Updated 11 months ago
- an implementation of Self-Extend, to expand the context window via grouped attentionâ119Updated last year
- Accelerating your LLM training to full speed! Made with â¤ī¸ by ServiceNow Researchâ178Updated this week