kozistr / triton-grpc-proxy-rsLinks
Proxy server for triton gRPC server that inferences embedding model in Rust
☆21Updated last year
Alternatives and similar repositories for triton-grpc-proxy-rs
Users that are interested in triton-grpc-proxy-rs are comparing it to the libraries listed below
Sorting:
- utilities for loading and running text embeddings with onnx☆44Updated 2 weeks ago
- A high performance batching router optimises max throughput for text inference workload☆16Updated last year
- A guidance compatibility layer for llama-cpp-python☆36Updated last year
- ☆131Updated last year
- inference code for mixtral-8x7b-32kseqlen☆101Updated last year
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆75Updated 2 years ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆137Updated last year
- Tiny inference-only implementation of LLaMA☆93Updated last year
- Chat Markup Language conversation library☆55Updated last year
- GPU accelerated client-side embeddings for vector search, RAG etc.☆66Updated last year
- Simplex Random Feature attention, in PyTorch☆74Updated last year
- Transformer GPU VRAM estimator☆66Updated last year
- ☆23Updated 7 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 6 months ago
- Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables☆20Updated 3 months ago
- Tools for formatting large language model prompts.☆13Updated last year
- A miniature version of Modal☆20Updated last year
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆45Updated 11 months ago
- Let's create synthetic textbooks together :)☆75Updated last year
- TensorRT-LLM server with Structured Outputs (JSON) built with Rust☆58Updated 4 months ago
- OpenAI compatible API for serving LLAMA-2 model☆218Updated last year
- Full finetuning of large language models without large memory requirements☆94Updated last year
- TOPLOC: is a novel method for verifiable inference that enables users to verify that LLM providers are using the correct model configurat…☆40Updated 4 months ago
- Because it's there.☆16Updated 11 months ago
- The DPAB-α Benchmark☆29Updated 7 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 4 months ago
- ☆46Updated last year
- Vector Database with support for late interaction and token level embeddings.☆55Updated 2 months ago
- Verbosity control for AI agents☆65Updated last year