kozistr / triton-grpc-proxy-rs
Proxy server for triton gRPC server that inferences embedding model in Rust
☆20Updated 6 months ago
Alternatives and similar repositories for triton-grpc-proxy-rs:
Users that are interested in triton-grpc-proxy-rs are comparing it to the libraries listed below
- Tiny inference-only implementation of LLaMA☆92Updated 10 months ago
- A high performance batching router optimises max throughput for text inference workload☆16Updated last year
- GPU accelerated client-side embeddings for vector search, RAG etc.☆65Updated last year
- Chat Markup Language conversation library☆55Updated last year
- utilities for loading and running text embeddings with onnx☆44Updated 6 months ago
- Tokun to can tokens☆16Updated this week
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- ☆20Updated 3 weeks ago
- ☆65Updated 8 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆122Updated 2 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 2 months ago
- Let's create synthetic textbooks together :)☆73Updated last year
- Routing on Random Forest (RoRF)☆114Updated 4 months ago
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆75Updated last year
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated 4 months ago
- Binary vector search example using Unum's USearch engine and pre-computed Wikipedia embeddings from Co:here and MixedBread☆18Updated 10 months ago
- A miniature version of Modal☆19Updated 8 months ago
- Embedding models from Jina AI☆58Updated last year
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 6 months ago
- ☆48Updated last year
- new optimizer☆19Updated 6 months ago
- NLP with Rust for Python 🦀🐍☆61Updated 8 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆43Updated 11 months ago
- ☆34Updated last year
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆30Updated 2 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆57Updated 11 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆29Updated last month
- inference code for mixtral-8x7b-32kseqlen☆99Updated last year