kozistr / triton-grpc-proxy-rsLinks
Proxy server for triton gRPC server that inferences embedding model in Rust
â21Updated 10 months ago
Alternatives and similar repositories for triton-grpc-proxy-rs
Users that are interested in triton-grpc-proxy-rs are comparing it to the libraries listed below
Sorting:
- A high performance batching router optimises max throughput for text inference workloadâ16Updated last year
- Unofficial python bindings for the rust llm library. đâ¤ī¸đĻâ75Updated last year
- utilities for loading and running text embeddings with onnxâ44Updated 10 months ago
- A miniature version of Modalâ20Updated last year
- â39Updated 2 years ago
- Vector Database with support for late interaction and token level embeddings.â55Updated 8 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language modelsâ78Updated last month
- SGLang is fast serving framework for large language models and vision language models.â23Updated 4 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.â30Updated 5 months ago
- A framework for evaluating function calls made by LLMsâ37Updated 11 months ago
- â66Updated last year
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIâ45Updated 8 months ago
- Full finetuning of large language models without large memory requirementsâ94Updated last year
- Using modal.com to process FineWeb-edu dataâ20Updated 2 months ago
- Tiny inference-only implementation of LLaMAâ93Updated last year
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oâĻâ137Updated last month
- Simple high-throughput inference libraryâ119Updated last month
- đšī¸ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.â137Updated 10 months ago
- â35Updated 2 years ago
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rustâ38Updated last year
- Chat Markup Language conversation libraryâ55Updated last year
- The DPAB-Îą Benchmarkâ25Updated 5 months ago
- A Python library to orchestrate LLMs in a neural network-inspired structureâ49Updated 8 months ago
- TensorRT-LLM server with Structured Outputs (JSON) built with Rustâ55Updated last month
- an implementation of Self-Extend, to expand the context window via grouped attentionâ119Updated last year
- Generates grammer files from typescript for LLM generationâ38Updated last year
- Sparse autoencoders for Contra text embedding modelsâ25Updated last year
- Transformer GPU VRAM estimatorâ65Updated last year
- Let's create synthetic textbooks together :)â75Updated last year
- â61Updated last year